GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Nil Biescas, Carlos Boned, Josep Llad\'os, Sanket Biswas

TL;DR
GeoContrastNet introduces a language-agnostic, graph attention network-based framework that leverages geometric and visual features for improved document understanding, matching large OCR-dependent models in accuracy and efficiency.
Contribution
The paper proposes a novel two-stage GAT-based framework integrating geometric edge features with visual cues for language-agnostic document understanding.
Findings
Effective link prediction and semantic entity recognition.
Matches OCR-based models in accuracy and efficiency.
Excels in key-value and spatial relationship detection.
Abstract
This paper presents GeoContrastNet, a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs), emphasizing the significant role of geometric features. We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework, demonstrating promising results in both link prediction and semantic entity recognition performance. Our findings reveal that combining both geometric and visual features could match the capabilities of large DU models that rely heavily on Optical Character Recognition (OCR) features in terms of performance accuracy and efficiency. This approach underscores the critical importance of relational layout information between the named text entities in a semi-structured layout of a page. Specifically, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsContrastive Learning
