PechaBridgeOCR

Tibetan OCR model (~200M parameters, DONUT / VisionEncoderDecoder architecture) fine-tuned on Tibetan pecha line images with PechaBridge.

Important: This model requires a custom BDRC-style grayscale preprocessing pipeline (adaptive binarization, background normalisation, aspect-preserving resize and padding). Using AutoImageProcessor directly will apply standard ImageNet normalisation and produce poor results. Use PechaBridge's batch-ocr CLI for correct end-to-end inference.

Recommended usage — PechaBridge CLI

# 1. Clone PechaBridge and install dependencies
git clone https://github.com/CodexAITeam/PechaBridge.git && cd PechaBridge
pip install -r requirements.txt

# 2. Download this model (and the line segmentation model)
python cli.py download-models

# 3a. Run batch OCR on a folder of pecha page images
python cli.py batch-ocr \
    --ocr-model     models/ocr/PechaBridgeOCR \
    --line-model    models/line_segmentation/PechaBridgeLineSegmentation.pt \
    --layout-engine yolo_line \
    --ocr-engine    donut \
    --input-dir     /path/to/pecha/images

# 3b. Or use the BDRC layout engine (auto-downloads BDRC line models)
python cli.py batch-ocr \
    --ocr-model     models/ocr/PechaBridgeOCR \
    --layout-engine bdrc_line \
    --ocr-engine    donut \
    --input-dir     /path/to/pecha/images

# 3c. Download + OCR a Staatsbibliothek zu Berlin pecha in one command
python cli.py batch-ocr \
    --ppn           337138764X \
    --ocr-model     models/ocr/PechaBridgeOCR \
    --layout-engine bdrc_line \
    --ocr-engine    donut

Each image produces a .txt transcript and an *_overlay.jpg with detected line boxes drawn on the source image.

Advanced: standalone Python usage

If you need to call the model directly from Python, you must apply the BDRC-style preprocessing manually before passing pixel values to the model. See pechabridge/ocr/preprocess_bdrc.py in the PechaBridge repository for the full preprocessing pipeline (gray pipeline was used for this checkpoint).

# Minimal example — BDRC gray preprocessing applied explicitly
import torch
from transformers import VisionEncoderDecoderModel, AutoTokenizer
from pechabridge.ocr.preprocess_bdrc import (
    BDRCPreprocessConfig,
    preprocess_image_bdrc,       # returns grayscale PIL Image (mode 'L')
    bdrc_image_to_normalized_tensor,  # grayscale PIL → float32 HW in [-1, 1]
)
from PIL import Image
import numpy as np

model = VisionEncoderDecoderModel.from_pretrained("TibetanCodexAITeam/PechaBridgeOCR")
tokenizer = AutoTokenizer.from_pretrained("TibetanCodexAITeam/PechaBridgeOCR")

# 1. Load the line crop as RGB
image = Image.open('line_crop.png').convert('RGB')

# 2. Apply BDRC preprocessing:
#    - converts to grayscale (luma by default)
#    - optionally normalises background, binarises, pads/resizes
#    Returns a grayscale PIL Image (mode 'L')
cfg = BDRCPreprocessConfig.ocr_line_defaults()  # defaults used during training
gray_pil = preprocess_image_bdrc(image, cfg)

# 3. Normalise to float32 in [-1, 1] and build a 3-channel tensor
#    (model expects C=3; replicate the single gray channel)
gray_hw = bdrc_image_to_normalized_tensor(image, cfg)  # shape: (H, W), float32
pixel_values = torch.tensor(gray_hw).unsqueeze(0).unsqueeze(0)  # (1, 1, H, W)
pixel_values = pixel_values.expand(-1, 3, -1, -1)               # (1, 3, H, W)

generated_ids = model.generate(pixel_values)
text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Model details

Checkpoint: checkpoint-154000 (step 154000)
Image preprocessing pipeline: gray
Repro bundle included: yes — preprocessing config is in repro/
Architecture: Swin Transformer encoder (hidden_size=768, 12 layers) + BART decoder (d_model=1024, 12 layers), ~200M parameters total
Training framework: PechaBridge
Training data: Tibetan pecha line images from OpenPecha and BDRC collections

Downloads last month: 15

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support