YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PechaBridgeOCR

Tibetan OCR model (~200M parameters, DONUT / VisionEncoderDecoder architecture) fine-tuned on Tibetan pecha line images with PechaBridge.

Important: This model requires a custom BDRC-style grayscale preprocessing pipeline (adaptive binarization, background normalisation, aspect-preserving resize and padding). Using AutoImageProcessor directly will apply standard ImageNet normalisation and produce poor results. Use PechaBridge's batch-ocr CLI for correct end-to-end inference.

Recommended usage β€” PechaBridge CLI

# 1. Clone PechaBridge and install dependencies
git clone https://github.com/CodexAITeam/PechaBridge.git && cd PechaBridge
pip install -r requirements.txt

# 2. Download this model (and the line segmentation model)
python cli.py download-models

# 3a. Run batch OCR on a folder of pecha page images
python cli.py batch-ocr \
    --ocr-model     models/ocr/PechaBridgeOCR \
    --line-model    models/line_segmentation/PechaBridgeLineSegmentation.pt \
    --layout-engine yolo_line \
    --ocr-engine    donut \
    --input-dir     /path/to/pecha/images

# 3b. Or use the BDRC layout engine (auto-downloads BDRC line models)
python cli.py batch-ocr \
    --ocr-model     models/ocr/PechaBridgeOCR \
    --layout-engine bdrc_line \
    --ocr-engine    donut \
    --input-dir     /path/to/pecha/images

# 3c. Download + OCR a Staatsbibliothek zu Berlin pecha in one command
python cli.py batch-ocr \
    --ppn           337138764X \
    --ocr-model     models/ocr/PechaBridgeOCR \
    --layout-engine bdrc_line \
    --ocr-engine    donut

Each image produces a .txt transcript and an *_overlay.jpg with detected line boxes drawn on the source image.

Advanced: standalone Python usage

If you need to call the model directly from Python, you must apply the BDRC-style preprocessing manually before passing pixel values to the model. See pechabridge/ocr/preprocess_bdrc.py in the PechaBridge repository for the full preprocessing pipeline (gray pipeline was used for this checkpoint).

# Minimal example β€” BDRC gray preprocessing applied explicitly
import torch
from transformers import VisionEncoderDecoderModel, AutoTokenizer
from pechabridge.ocr.preprocess_bdrc import (
    BDRCPreprocessConfig,
    preprocess_image_bdrc,       # returns grayscale PIL Image (mode 'L')
    bdrc_image_to_normalized_tensor,  # grayscale PIL β†’ float32 HW in [-1, 1]
)
from PIL import Image
import numpy as np

model = VisionEncoderDecoderModel.from_pretrained("TibetanCodexAITeam/PechaBridgeOCR")
tokenizer = AutoTokenizer.from_pretrained("TibetanCodexAITeam/PechaBridgeOCR")

# 1. Load the line crop as RGB
image = Image.open('line_crop.png').convert('RGB')

# 2. Apply BDRC preprocessing:
#    - converts to grayscale (luma by default)
#    - optionally normalises background, binarises, pads/resizes
#    Returns a grayscale PIL Image (mode 'L')
cfg = BDRCPreprocessConfig.ocr_line_defaults()  # defaults used during training
gray_pil = preprocess_image_bdrc(image, cfg)

# 3. Normalise to float32 in [-1, 1] and build a 3-channel tensor
#    (model expects C=3; replicate the single gray channel)
gray_hw = bdrc_image_to_normalized_tensor(image, cfg)  # shape: (H, W), float32
pixel_values = torch.tensor(gray_hw).unsqueeze(0).unsqueeze(0)  # (1, 1, H, W)
pixel_values = pixel_values.expand(-1, 3, -1, -1)               # (1, 3, H, W)

generated_ids = model.generate(pixel_values)
text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Model details

  • Checkpoint: checkpoint-154000 (step 154000)
  • Image preprocessing pipeline: gray
  • Repro bundle included: yes β€” preprocessing config is in repro/
  • Architecture: Swin Transformer encoder (hidden_size=768, 12 layers) + BART decoder (d_model=1024, 12 layers), ~200M parameters total
  • Training framework: PechaBridge
  • Training data: Tibetan pecha line images from OpenPecha and BDRC collections
Downloads last month
15
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support