Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published 24 days ago • 28
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models Paper • 2601.21639 • Published 10 days ago • 49
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 19 days ago • 22
Parakeet Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 16 items • Updated 4 days ago • 54
Step-Audio-R1 Collection Step-Audio-R1 is the first audio language model to successfully unlock test-time compute scaling. • 4 items • Updated 25 days ago • 18
LightOnOCR 🦉 Collection The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR • 8 items • Updated 19 days ago • 15
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23, 2025 • 73
MathCanvas Collection Datasets and models for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning" • 5 items • Updated Nov 19, 2025 • 3
MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning Paper • 2510.14958 • Published Oct 16, 2025 • 23