OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei, Huang, Xiang Bai, Cong Yao, Zhibo Yang

TL;DR
OmniParser introduces a unified model that simultaneously handles text spotting, key information extraction, and table recognition, achieving state-of-the-art results across multiple datasets with a simplified architecture.
Contribution
The paper presents OmniParser, a universal framework that unifies three visually-situated text parsing tasks using shared architecture and objectives, reducing complexity and improving performance.
Findings
Achieves SOTA or competitive results on 7 datasets
Unified model simplifies workflow across tasks
Demonstrates versatility across diverse document scenarios
Abstract
Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions. Various methods have been proposed to address the challenging problem of VsTP. However, due to the diversified targets and heterogeneous schemas, previous works usually design task-specific architectures and objectives for individual tasks, which inadvertently leads to modal isolation and complex workflow. In this paper, we propose a unified paradigm for parsing visually-situated text across diverse scenarios. Specifically, we devise a universal model, called OmniParser, which can simultaneously handle three typical visually-situated text parsing tasks: text spotting, key information extraction, and table recognition. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies · Natural Language Processing Techniques
