GutenOCR-3B-AIO-GGUF
GutenOCR-3B from rootsautomation is a 3B-parameter grounded OCR vision-language model fine-tuned from Qwen2.5-VL-3B, part of the GutenOCR family designed as a unified front-end for full-page reading, text detection, and grounding through prompt-specified input-output schemas, supporting line- and paragraph-level bounding boxes plus conditional "where is x?" queries on business documents, scientific articles, and synthetic data. It more than doubles the composite grounded OCR score of its Qwen2.5-VL backbone (0.348→0.811) across 10.5K held-out pages, substantially improving region-/line-level CER and text-detection F1 on Fox/OmniDocBench v1.5 benchmarks while revealing trade-offs in page linearization, color-guided OCR, and formula-heavy layouts. The single-checkpoint VLM integrates reading/detection/grounding seamlessly for business workflows and human-in-the-loop verification, with GGUF quantizations (Q4_K_S/M recommended at 1.9-2.0GB) available for efficient deployment.
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 1,149
Model tree for prithivMLmods/GutenOCR-3B-AIO-GGUF
Base model
Qwen/Qwen2.5-VL-3B-Instruct