Enhancing Visually-Rich Document Understanding via Layout Structure   Modeling

Qiwei Li; Zuchao Li; Xiantao Cai; Bo Du; Hai Zhao

arXiv:2308.07777·cs.CL·August 16, 2023

Enhancing Visually-Rich Document Understanding via Layout Structure Modeling

Qiwei Li, Zuchao Li, Xiantao Cai, Bo Du, Hai Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces GraphLayoutLM, a new model that incorporates layout structure graphs into document understanding, significantly improving performance by modeling spatial relationships between text elements.

Contribution

The paper presents GraphLayoutLM, a novel approach that integrates layout structure graphs and layout-aware self-attention to enhance visually-rich document understanding.

Findings

01

Achieves state-of-the-art results on FUNSD, XFUND, and CORD datasets.

02

Both graph reordering and layout-aware attention are crucial for optimal performance.

03

Significant improvement over existing models by incorporating layout information.

Abstract

In recent years, the use of multi-modal pre-trained Transformers has led to significant advancements in visually-rich document understanding. However, existing models have mainly focused on features such as text and vision while neglecting the importance of layout relationship between text nodes. In this paper, we propose GraphLayoutLM, a novel document understanding model that leverages the modeling of layout structure graph to inject document layout knowledge into the model. GraphLayoutLM utilizes a graph reordering algorithm to adjust the text sequence based on the graph structure. Additionally, our model uses a layout-aware multi-head self-attention layer to learn document layout knowledge. The proposed model enables the understanding of the spatial arrangement of text elements, improving document comprehension. We evaluate our model on various benchmarks, including FUNSD, XFUND and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

line-kite/graphlayoutlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Video Analysis and Summarization · Natural Language Processing Techniques