GutenOCR-3B-AIO-GGUF

GutenOCR-3B from rootsautomation is a 3B-parameter grounded OCR vision-language model fine-tuned from Qwen2.5-VL-3B, part of the GutenOCR family designed as a unified front-end for full-page reading, text detection, and grounding through prompt-specified input-output schemas, supporting line- and paragraph-level bounding boxes plus conditional "where is x?" queries on business documents, scientific articles, and synthetic data. It more than doubles the composite grounded OCR score of its Qwen2.5-VL backbone (0.348→0.811) across 10.5K held-out pages, substantially improving region-/line-level CER and text-detection F1 on Fox/OmniDocBench v1.5 benchmarks while revealing trade-offs in page linearization, color-guided OCR, and formula-heavy layouts. The single-checkpoint VLM integrates reading/detection/grounding seamlessly for business workflows and human-in-the-loop verification, with GGUF quantizations (Q4_K_S/M recommended at 1.9-2.0GB) available for efficient deployment.

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
1,149
GGUF
Model size
3B params
Architecture
qwen2vl
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/GutenOCR-3B-AIO-GGUF

Quantized
(3)
this model