GeoContrastNet: Contrastive Key-Value Edge Learning for   Language-Agnostic Document Understanding

Nil Biescas; Carlos Boned; Josep Llad\'os; Sanket Biswas

arXiv:2405.03104·cs.CV·May 7, 2024

GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding

Nil Biescas, Carlos Boned, Josep Llad\'os, Sanket Biswas

PDF

Open Access 1 Repo

TL;DR

GeoContrastNet introduces a language-agnostic, graph attention network-based framework that leverages geometric and visual features for improved document understanding, matching large OCR-dependent models in accuracy and efficiency.

Contribution

The paper proposes a novel two-stage GAT-based framework integrating geometric edge features with visual cues for language-agnostic document understanding.

Findings

01

Effective link prediction and semantic entity recognition.

02

Matches OCR-based models in accuracy and efficiency.

03

Excels in key-value and spatial relationship detection.

Abstract

This paper presents GeoContrastNet, a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs), emphasizing the significant role of geometric features. We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework, demonstrating promising results in both link prediction and semantic entity recognition performance. Our findings reveal that combining both geometric and visual features could match the capabilities of large DU models that rely heavily on Optical Character Recognition (OCR) features in terms of performance accuracy and efficiency. This approach underscores the critical importance of relational layout information between the named text entities in a semi-structured layout of a page. Specifically, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NilBiescas/GeoContrastNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsContrastive Learning