Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang, Yi Tu, Yixi Zhao, Chenshu Yuan, Huan Chen, Yue Zhang,, Mingxu Chai, Ya Guo, Huijia Zhu, Qi Zhang, Tao Gui

TL;DR
This paper introduces a new way to model layout reading order in visually-rich documents as ordering relations, providing a more expressive representation that improves downstream document understanding tasks.
Contribution
It proposes modeling reading order as relations over layout elements, establishes a benchmark dataset, and develops a relation-extraction method that enhances downstream VrD performance.
Findings
Improved reading order modeling leads to state-of-the-art results.
Relation-based approach outperforms permutation-based methods.
Enhanced models show consistent performance gains across multiple tasks.
Abstract
Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated layout reading order as a permutation of layout elements, i.e. a sequence containing all the layout elements. However, we argue that this formulation does not adequately convey the complete reading order information in the layout, which may potentially lead to performance decline in downstream VrD tasks. To address this issue, we propose to model the layout reading order as ordering relations over the set of layout elements, which have sufficient expressive capability for the complete reading order information. To enable empirical evaluation on methods towards the improved form of reading order prediction (ROP), we establish a comprehensive benchmark dataset including the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Video Analysis and Summarization · Digital Humanities and Scholarship
