AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation
Zhengren Wang, Dongsheng Ma, Huaping Zhong, Jiayu Li, Wentao Zhang, Bin Wang, Conghui He

TL;DR
AgenticOCR introduces a query-driven, selective OCR parsing method that enhances the efficiency and accuracy of retrieval-augmented generation for complex visual documents by focusing only on relevant regions.
Contribution
This paper presents a novel dynamic OCR parsing paradigm that enables on-demand, region-specific recognition, improving long document understanding in visual RAG systems.
Findings
Improves efficiency of visual RAG systems
Achieves expert-level performance in long document understanding
Reduces extraneous context and hallucinations
Abstract
The expansion of retrieval-augmented generation (RAG) into multimodal domains has intensified the challenge for processing complex visual documents, such as financial reports. While page-level chunking and retrieval is a natural starting point, it creates a critical bottleneck: delivering entire pages to the generator introduces excessive extraneous context. This not only overloads the generator's attention mechanism but also dilutes the most salient evidence. Moreover, compressing these information-rich pages into a limited visual token budget further increases the risk of hallucinations. To address this, we introduce AgenticOCR, a dynamic parsing paradigm that transforms optical character recognition (OCR) from a static, full-text process into a query-driven, on-demand extraction system. By autonomously analyzing document layout in a "thinking with images" manner, AgenticOCR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
