CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto, Youngmin Baek, Geewook Kim, Ryota Nakao, DongHyun Kim,, Moon Bin Yim, Seunghyun Park, Bado Lee

TL;DR
CREPE is a novel OCR-free, coordinate-aware sequence generation model for visual document understanding that integrates text parsing and spatial coordinate extraction, achieving state-of-the-art results across multiple document understanding tasks.
Contribution
CREPE introduces a unified, OCR-free approach with coordinate decoding and weak supervision, advancing document parsing and understanding capabilities.
Findings
Achieves state-of-the-art performance on document parsing tasks.
Successfully applied to layout analysis and visual question answering.
Reduces error propagation compared to OCR-dependent methods.
Abstract
In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OCR text, and token-triggered coordinate decoding. We also proposed a weakly-supervised framework for cost-efficient training, requiring only parsing annotations without high-cost coordinate annotations. Our experimental evaluations demonstrate CREPE's state-of-the-art performances on document parsing tasks. Beyond that, CREPE's adaptability is further highlighted by its successful usage in other document understanding tasks such as layout analysis, document visual question answering, and so one.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
