OmniParser: A Unified Framework for Text Spotting, Key Information   Extraction and Table Recognition

Jianqiang Wan; Sibo Song; Wenwen Yu; Yuliang Liu; Wenqing Cheng; Fei; Huang; Xiang Bai; Cong Yao; Zhibo Yang

arXiv:2403.19128·cs.CV·March 29, 2024·1 cites

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei, Huang, Xiang Bai, Cong Yao, Zhibo Yang

PDF

Open Access 1 Repo

TL;DR

OmniParser introduces a unified model that simultaneously handles text spotting, key information extraction, and table recognition, achieving state-of-the-art results across multiple datasets with a simplified architecture.

Contribution

The paper presents OmniParser, a universal framework that unifies three visually-situated text parsing tasks using shared architecture and objectives, reducing complexity and improving performance.

Findings

01

Achieves SOTA or competitive results on 7 datasets

02

Unified model simplifies workflow across tasks

03

Demonstrates versatility across diverse document scenarios

Abstract

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions. Various methods have been proposed to address the challenging problem of VsTP. However, due to the diversified targets and heterogeneous schemas, previous works usually design task-specific architectures and objectives for individual tasks, which inadvertently leads to modal isolation and complex workflow. In this paper, we propose a unified paradigm for parsing visually-situated text across diverse scenarios. Specifically, we devise a universal model, called OmniParser, which can simultaneously handle three typical visually-situated text parsing tasks: text spotting, key information extraction, and table recognition. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibabaresearch/advancedliteratemachinery
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies · Natural Language Processing Techniques