StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Yulin Li; Yuxi Qian; Yuchen Yu; Xiameng Qin; Chengquan; Zhang; Yan Liu; Kun Yao; Junyu Han; Jingtuo Liu; Errui Ding

arXiv:2108.02923·cs.CV·November 9, 2021·6 cites

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Yulin Li, Yuxi Qian, Yuchen Yu, Xiameng Qin, Chengquan, Zhang, Yan Liu, Kun Yao, Junyu Han, Jingtuo Liu, Errui Ding

PDF

Open Access 1 Repo

TL;DR

StrucTexT introduces a unified multi-modal transformer framework for structured text understanding in visually rich documents, effectively handling entity labeling and linking through novel pre-training strategies and outperforming existing methods.

Contribution

The paper presents a flexible, unified transformer-based framework with a new pre-training strategy for efficient structured text understanding at multiple levels.

Findings

01

Outperforms state-of-the-art on FUNSD, SROIE, and EPHOIE datasets.

02

Effectively handles both entity labeling and linking tasks.

03

Utilizes multi-modal information across text, image, and layout.

Abstract

Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence. Due to the complexity of content and layout in VRDs, structured text understanding has been a challenging task. Most existing studies decoupled this problem into two sub-tasks: entity labeling and entity linking, which require an entire understanding of the context of documents at both token and segment levels. However, little work has been concerned with the solutions that efficiently extract the structured data from different levels. This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks. Specifically, based on the transformer, we introduce a segment-token aligned encoder to deal with the entity labeling and entity linking tasks at different levels of granularity. Moreover, we design a novel pre-training strategy with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PaddlePaddle/VIMER/tree/main/StrucTexT
paddleOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Natural Language Processing Techniques