UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters
Yongkun Du, Zhineng Chen, Yazhen Xie, Weikang Bai, Hao Feng, Wei Shi, Yuchen Su, Can Huang, Yu-Gang Jiang

TL;DR
UniRec-0.1B is a lightweight, unified model for text and formula recognition that outperforms larger models in accuracy and speed, trained on a large-scale dataset with innovative hierarchical and semantic decoupling techniques.
Contribution
The paper introduces UniRec-0.1B, a novel 0.1-billion-parameter model for unified text and formula recognition, with new hierarchical supervision and semantic-decoupled tokenization methods.
Findings
Outperforms larger vision-language models and expert systems.
Achieves 2-9 times faster processing speed.
Demonstrates effectiveness across multiple languages and domains.
Abstract
Text and formulas constitute the core informational components of many documents. Accurately and efficiently recognizing both is crucial for developing robust and generalizable document parsing systems. Recently, vision-language models (VLMs) have achieved impressive unified recognition of text and formulas. However, they are large-sized and computationally demanding, restricting their usage in many applications. In this paper, we propose UniRec-0.1B, a unified recognition model with only 0.1B parameters. It is capable of performing text and formula recognition at multiple levels, including characters, words, lines, paragraphs, and documents. To implement this task, we first establish UniRec40M, a large-scale dataset comprises 40 million text, formula and their mix samples, enabling the training of a powerful yet lightweight model. Secondly, we identify two challenges when building such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
