Reading Order Matters: Information Extraction from Visually-rich   Documents by Token Path Prediction

Chong Zhang; Ya Guo; Yi Tu; Huan Chen; Jinyang Tang; Huijia Zhu; Qi; Zhang; Tao Gui

arXiv:2310.11016·cs.CL·October 18, 2023·1 cites

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Chong Zhang, Ya Guo, Yi Tu, Huan Chen, Jinyang Tang, Huijia Zhu, Qi, Zhang, Tao Gui

PDF

Open Access 2 Repos

TL;DR

This paper introduces Token Path Prediction (TPP), a novel approach for extracting entities from visually-rich documents that overcomes reading order issues affecting traditional sequence-labeling methods.

Contribution

The paper proposes TPP, modeling document layout as a token graph and predicting entity paths, along with revised benchmarks for realistic evaluation of VrD-NER systems.

Findings

01

TPP outperforms traditional BIO-tagging methods in VrD-NER tasks.

02

Revised benchmarks better reflect real-world OCR scenarios.

03

TPP shows potential as a universal solution for document information extraction.

Abstract

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Multimodal Machine Learning Applications