PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis
Shu Wei, Nuo Xu

TL;DR
Paragraph2Graph is a language-independent GNN-based framework for document layout analysis that overcomes language dependency and length limitations of current methods, offering competitive results and suitability for industrial multi-language applications.
Contribution
We introduce Paragraph2Graph, a novel GNN-based model that is language-independent, handles long documents, and is suitable for industrial multi-language layout analysis.
Findings
Achieves competitive results on standard layout datasets.
Uses only 19.95 million parameters, suitable for industrial deployment.
Effectively handles multi-language and long document scenarios.
Abstract
Document layout analysis has a wide range of requirements across various domains, languages, and business scenarios. However, most current state-of-the-art algorithms are language-dependent, with architectures that rely on transformer encoders or language-specific text encoders, such as BERT, for feature extraction. These approaches are limited in their ability to handle very long documents due to input sequence length constraints and are closely tied to language-specific tokenizers. Additionally, training a cross-language text encoder can be challenging due to the lack of labeled multilingual document datasets that consider privacy. Furthermore, some layout tasks require a clean separation between different layout components without overlap, which can be difficult for image segmentation-based algorithms to achieve. In this paper, we present Paragraph2Graph, a language-independent graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Graph Neural Network · Linear Layer · Adam · Attention Dropout · WordPiece · Dense Connections · Dropout · Weight Decay
