Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
Miguel Lopez-Duran, Julian Fierrez, Aythami Morales, Ruben Tolosana, Oscar Delgado-Mohatar, Alvaro Ortigosa

TL;DR
This paper benchmarks various Graph Neural Network architectures for detailed layout classification of digital documents, demonstrating the effectiveness of local relationships and multimodal fusion in improving accuracy.
Contribution
It introduces two graph construction methods and evaluates multimodal GNN frameworks on a large, diverse dataset of public affairs documents, highlighting the best configurations.
Findings
GraphSAGE with k-closest-neighbor graph achieves top accuracy.
Dual-branch multimodal GNN outperforms single-modality models.
Local layout relationships and multimodal fusion are crucial for document analysis.
Abstract
The automatic analysis of document layouts in digital-born PDF documents remains a challenging problem due to the heterogeneous arrangement of textual and nontextual elements and the imprecision of the textual metadata in the Portable Document Format. In this work, we benchmark Graph Neural Network (GNN) architectures for the task of fine-grained layout classification of text blocks from digital native documents. We introduce two graph construction structures: a k-closest-neighbor graph and a fully connected graph, and generate node features via pre-trained text and vision models, thus avoiding manual feature engineering. Three experimental frameworks are evaluated: single-modality (text or visual), concatenated multimodal, and dual-branch multimodal. We evaluated four foundational GNN models and compared them with the baseline. Our experiments are specifically conducted on a rich…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Text Analysis Techniques
MethodsGraph Neural Network · GraphSAGE
