YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PechaBridgeOCR
Tibetan OCR model (~200M parameters, DONUT / VisionEncoderDecoder architecture) fine-tuned on Tibetan pecha line images with PechaBridge.
Important: This model requires a custom BDRC-style grayscale preprocessing pipeline (adaptive binarization, background normalisation, aspect-preserving resize and padding). Using
AutoImageProcessordirectly will apply standard ImageNet normalisation and produce poor results. Use PechaBridge'sbatch-ocrCLI for correct end-to-end inference.
Recommended usage β PechaBridge CLI
# 1. Clone PechaBridge and install dependencies
git clone https://github.com/CodexAITeam/PechaBridge.git && cd PechaBridge
pip install -r requirements.txt
# 2. Download this model (and the line segmentation model)
python cli.py download-models
# 3a. Run batch OCR on a folder of pecha page images
python cli.py batch-ocr \
--ocr-model models/ocr/PechaBridgeOCR \
--line-model models/line_segmentation/PechaBridgeLineSegmentation.pt \
--layout-engine yolo_line \
--ocr-engine donut \
--input-dir /path/to/pecha/images
# 3b. Or use the BDRC layout engine (auto-downloads BDRC line models)
python cli.py batch-ocr \
--ocr-model models/ocr/PechaBridgeOCR \
--layout-engine bdrc_line \
--ocr-engine donut \
--input-dir /path/to/pecha/images
# 3c. Download + OCR a Staatsbibliothek zu Berlin pecha in one command
python cli.py batch-ocr \
--ppn 337138764X \
--ocr-model models/ocr/PechaBridgeOCR \
--layout-engine bdrc_line \
--ocr-engine donut
Each image produces a .txt transcript and an *_overlay.jpg with detected line boxes drawn on the source image.
Advanced: standalone Python usage
If you need to call the model directly from Python, you must apply the BDRC-style preprocessing manually before passing pixel values to the model. See pechabridge/ocr/preprocess_bdrc.py in the PechaBridge repository for the full preprocessing pipeline (gray pipeline was used for this checkpoint).
# Minimal example β BDRC gray preprocessing applied explicitly
import torch
from transformers import VisionEncoderDecoderModel, AutoTokenizer
from pechabridge.ocr.preprocess_bdrc import (
BDRCPreprocessConfig,
preprocess_image_bdrc, # returns grayscale PIL Image (mode 'L')
bdrc_image_to_normalized_tensor, # grayscale PIL β float32 HW in [-1, 1]
)
from PIL import Image
import numpy as np
model = VisionEncoderDecoderModel.from_pretrained("TibetanCodexAITeam/PechaBridgeOCR")
tokenizer = AutoTokenizer.from_pretrained("TibetanCodexAITeam/PechaBridgeOCR")
# 1. Load the line crop as RGB
image = Image.open('line_crop.png').convert('RGB')
# 2. Apply BDRC preprocessing:
# - converts to grayscale (luma by default)
# - optionally normalises background, binarises, pads/resizes
# Returns a grayscale PIL Image (mode 'L')
cfg = BDRCPreprocessConfig.ocr_line_defaults() # defaults used during training
gray_pil = preprocess_image_bdrc(image, cfg)
# 3. Normalise to float32 in [-1, 1] and build a 3-channel tensor
# (model expects C=3; replicate the single gray channel)
gray_hw = bdrc_image_to_normalized_tensor(image, cfg) # shape: (H, W), float32
pixel_values = torch.tensor(gray_hw).unsqueeze(0).unsqueeze(0) # (1, 1, H, W)
pixel_values = pixel_values.expand(-1, 3, -1, -1) # (1, 3, H, W)
generated_ids = model.generate(pixel_values)
text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Model details
- Checkpoint:
checkpoint-154000(step 154000) - Image preprocessing pipeline:
gray - Repro bundle included: yes β preprocessing config is in
repro/ - Architecture: Swin Transformer encoder (hidden_size=768, 12 layers) + BART decoder (d_model=1024, 12 layers), ~200M parameters total
- Training framework: PechaBridge
- Training data: Tibetan pecha line images from OpenPecha and BDRC collections
- Downloads last month
- 15