GutenOCR-3B-AIO-GGUF

GutenOCR-3B from rootsautomation is a 3B-parameter grounded OCR vision-language model fine-tuned from Qwen2.5-VL-3B, part of the GutenOCR family designed as a unified front-end for full-page reading, text detection, and grounding through prompt-specified input-output schemas, supporting line- and paragraph-level bounding boxes plus conditional "where is x?" queries on business documents, scientific articles, and synthetic data. It more than doubles the composite grounded OCR score of its Qwen2.5-VL backbone (0.348→0.811) across 10.5K held-out pages, substantially improving region-/line-level CER and text-detection F1 on Fox/OmniDocBench v1.5 benchmarks while revealing trade-offs in page linearization, color-guided OCR, and formula-heavy layouts. The single-checkpoint VLM integrates reading/detection/grounding seamlessly for business workflows and human-in-the-loop verification, with GGUF quantizations (Q4_K_S/M recommended at 1.9-2.0GB) available for efficient deployment.

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

Downloads last month: 1,149

GGUF

Model size

3B params

Architecture

qwen2vl

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

View +1 variant

Model tree for prithivMLmods/GutenOCR-3B-AIO-GGUF

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

rootsautomation/GutenOCR-3B

Quantized

(3)

this model