A Graphical Approach to Document Layout Analysis
Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim, Sokolov, Vadym Barda, Delphine Vendryes, and Chris Tanner

TL;DR
This paper introduces GLAM, a lightweight graph neural network for document layout analysis that leverages PDF metadata, achieving competitive accuracy with significantly reduced model size and improved efficiency.
Contribution
The paper presents GLAM, a novel graph neural network model that outperforms larger vision-based models on DLA tasks by utilizing structured PDF metadata.
Findings
GLAM outperforms a 140M+ parameter model on several classes.
Ensemble of GLAM and vision model achieves new state-of-the-art.
GLAM is over 5 times more efficient than existing models.
Abstract
Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsGraph Neural Network · Deep Layer Aggregation
