LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
Said Taghadouini, Adrien Cavaill\`es, Baptiste Aubertin

TL;DR
LightOnOCR-2-1B is a compact, multilingual vision-language model that converts document images into accurate, well-ordered text, outperforming larger models in OCR tasks with enhanced localization and robustness features.
Contribution
The paper introduces LightOnOCR-2-1B, a 1-billion-parameter end-to-end model that achieves state-of-the-art OCR performance across multiple languages and document types, with improved localization and efficiency.
Findings
Achieves state-of-the-art results on OlmOCR-Bench
9× smaller and faster than previous models
Enhanced localization and robustness features
Abstract
We present \textbf{LightOnOCR-2-1B}, a 1B-parameter end-to-end multilingual vision--language model that converts document images (e.g., PDFs) into clean, naturally ordered text without brittle OCR pipelines. Trained on a large-scale, high-quality distillation mix with strong coverage of scans, French documents, and scientific PDFs, LightOnOCR-2 achieves state-of-the-art results on OlmOCR-Bench while being 9 smaller and substantially faster than prior best-performing models. We further extend the output format to predict normalized bounding boxes for embedded images, introducing localization during pretraining via a resume strategy and refining it with RLVR using IoU-based rewards. Finally, we improve robustness with checkpoint averaging and task-arithmetic merging. We release model checkpoints under Apache 2.0, and publicly release the dataset and \textbf{LightOnOCR-bbox-bench}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗lightonai/LightOnOCR-2-1Bmodel· 577k dl· ♡ 636577k dl♡ 636
- 🤗lightonai/LightOnOCR-1B-1025model· 169k dl· ♡ 247169k dl♡ 247
- 🤗lightonai/LightOnOCR-0.9B-16k-1025model· 24 dl· ♡ 1224 dl♡ 12
- 🤗lightonai/LightOnOCR-0.9B-32k-1025model· 151 dl· ♡ 19151 dl♡ 19
- 🤗lightonai/LightOnOCR-2-1B-bboxmodel· 5.3k dl· ♡ 225.3k dl♡ 22
- 🤗lightonai/LightOnOCR-2-1B-bbox-basemodel· 233 dl· ♡ 3233 dl♡ 3
- 🤗lightonai/LightOnOCR-2-1B-basemodel· 9.2k dl· ♡ 119.2k dl♡ 11
- 🤗lightonai/LightOnOCR-2-1B-ocr-soupmodel· 1.4k dl· ♡ 81.4k dl♡ 8
- 🤗lightonai/LightOnOCR-2-1B-bbox-soupmodel· 2.7k dl· ♡ 132.7k dl♡ 13
- 🤗wjbmattingly/LightOnOCR-2-1B-old-church-slavonic-linemodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
