TL;DR
Revise is a framework that systematically corrects OCR errors at multiple levels using synthetic data, improving document understanding and management in Document AI applications.
Contribution
It introduces a hierarchical taxonomy of OCR errors and a synthetic data generation strategy to train effective correction models for structured document understanding.
Findings
Revise significantly improves OCR correction accuracy.
Enhanced OCR correction leads to better document retrieval and question answering performance.
The framework effectively manages structural errors in OCR outputs.
Abstract
Recent advances in Large Language Models (LLMs) have significantly improved the field of Document AI, demonstrating remarkable performance on document understanding tasks such as question answering. However, existing approaches primarily focus on solving specific tasks, lacking the capability to structurally organize and manage document information. To address this limitation, we propose Revise, a framework that systematically corrects errors introduced by OCR at the character, word, and structural levels. Specifically, Revise employs a comprehensive hierarchical taxonomy of common OCR errors and a synthetic data generation strategy that realistically simulates such errors to train an effective correction model. Experimental results demonstrate that Revise effectively corrects OCR outputs, enabling more structured representation and systematic management of document contents.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
