Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
Baode Wang, Biao Wu, Weizhen Li, Meng Fang, Zuming Huang, Jun Huang, Haozhe Wang, Yanjie Liang, Ling Chen, Wei Chu, Yuan Qi

TL;DR
This paper presents Infinity-Parser, a layout-aware reinforcement learning framework for parsing scanned documents, achieving state-of-the-art accuracy and structural fidelity across multiple languages and document types.
Contribution
The introduction of layoutRL and the Infinity-Doc-55K dataset enables end-to-end training of layout-aware parsers, improving robustness and adaptability over traditional multi-stage pipelines.
Findings
Achieved state-of-the-art performance on OCR, table, and formula extraction tasks.
Outperformed specialist pipelines and general vision-language models.
Demonstrated effectiveness across English and Chinese document benchmarks.
Abstract
Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse layouts. We introduce layoutRL, an end-to-end reinforcement learning framework that trains models to be explicitly layout-aware by optimizing a composite reward of normalized edit distance, paragraph count accuracy, and reading order preservation. Leveraging our newly released dataset, Infinity-Doc-55K, which combines 55K high-fidelity synthetic scanned document parsing data with expert-filtered real-world documents, we instantiate layoutRL in a vision-language-model-based parser called Infinity-Parser. Evaluated on English and Chinese benchmarks for OCR, table and formula extraction, and reading order detection, Infinity-Parser achieves new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
