TMBLD-YOLO26m — Tibetan Modern book layout dection

A fine-tuned YOLO26m object-detection model for Tibetan Modern book layout dection. The model detects four layout classes in Tibetan modern book page images: header, Text area, footnote, and footer.

Model Description

This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the BDRC/TDLA-Training-Dataset, a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.

Property Value
Architecture YOLO26m
Task Object Detection
Image size 640 × 640
Number of classes 4
Training platform Ultralytics HUB
Weights file Tibetan_modern_book_Layout_detection.pt

Classes

ID Class Description
0 header Page header region
1 Text area Main body text region
2 footnote Footnote region
3 footer Page footer region

Performance

Evaluated on the validation split of the TDLA Training Dataset.

Metric Value
Precision 0.966
Recall 0.970
mAP@0.5 0.982
mAP@0.5:0.95 0.799

Training Loss (final epoch)

Loss Component Train Val
Box loss 0.515 0.643
Classification loss 0.218 0.276
DFL loss 0.003 0.004

Training Details

Dataset

  • Dataset: BDRC/TDLA-Training-Dataset
  • Train images: 2,692
  • Val images: 103
  • Test images: 313
  • Total annotations: 14,705
  • Train/Val split: Iterative multi-label stratification (seed 42, 80/20 ratio)

Hyperparameters

Parameter Value
Epochs 150
Patience 100
Batch size Auto (-1)
Image size 640
Optimizer Auto (SGD)
Initial learning rate (lr0) 0.01
Final learning rate factor (lrf) 0.01
Momentum 0.937
Weight decay 0.0005
Warmup epochs 3.0
Warmup momentum 0.8
Warmup bias lr 0.1
AMP (mixed precision) True
Pretrained True
Deterministic True
Seed 0

Loss Weights

Component Weight
Box 7.5
Classification 0.5
DFL 1.5

Augmentation

Augmentation Value
HSV-Hue 0.015
HSV-Saturation 0.7
HSV-Value 0.4
Translation 0.1
Scale 0.5
Flip left-right 0.5
Mosaic 1.0
Erasing 0.4
Close mosaic (last N epochs) 10
Auto augment RandAugment

Usage

Inference with Ultralytics

from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("page_image.jpg", imgsz=640)

for result in results:
    boxes = result.boxes
    for box in boxes:
        cls_id = int(box.cls)
        conf = float(box.conf)
        xyxy = box.xyxy[0].tolist()
        print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")

Batch Inference

from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("path/to/images/", imgsz=640, conf=0.25)

Intended Use

This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:

  • OCR pipelines on Tibetan documents
  • Document digitization workflows
  • Structured text extraction from scanned Tibetan texts
  • Digital library cataloging and indexing

Limitations

  • Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
  • Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher imgsz values.
  • The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.

License

This model is released under the CC0 1.0 Universal (Public Domain Dedication). You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.

Acknowledgements

This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.

Citation

If you use this model, please cite the dataset:

@software{bdrc_tmbld_yolo26m_2026,
  title   = {tmbld-YOLO26m: Tibetan Modern book layout detection Model},
  author  = {Buddhist Digital Resource Center (BDRC)},
  year    = {2026},
  url     = {https://huggingface.co/BDRC/TDLA-YOLO26m},
  license = {CC0-1.0}
}
Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train BDRC/Tibetan_Modern_Book_Layout_Detection_Model

Evaluation results