DLAFormer: An End-to-End Transformer For Document Layout Analysis

Jiawei Wang; Kai Hu; Qiang Huo

arXiv:2405.11757·cs.CV·May 21, 2024

DLAFormer: An End-to-End Transformer For Document Layout Analysis

Jiawei Wang, Kai Hu, Qiang Huo

PDF

Open Access

TL;DR

DLAFormer is an end-to-end transformer model that unifies multiple document layout analysis tasks into a single framework, improving accuracy over previous multi-stage methods on key benchmarks.

Contribution

It introduces a unified relation prediction approach with type-wise queries and a coarse-to-fine strategy for comprehensive document layout analysis.

Findings

01

Outperforms previous multi-branch models on DocLayNet and Comp-HRDoc.

02

Effectively integrates sub-tasks into one model with relation prediction.

03

Enhances content query interpretability with type-wise queries.

Abstract

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically used separate models to address individual sub-tasks within DLA, including table/figure detection, text region detection, logical role classification, and reading order prediction. In this work, we propose an end-to-end transformer-based approach for document layout analysis, called DLAFormer, which integrates all these sub-tasks into a single model. To achieve this, we treat various DLA sub-tasks (such as text region detection, logical role classification, and reading order prediction) as relation prediction problems and consolidate these relation prediction labels into a unified label space, allowing a unified relation prediction module to handle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques

MethodsSparse Evolutionary Training · Deep Layer Aggregation