TL;DR
FUDGE is a graph-based model that dynamically edits text entity graphs in form images, achieving high accuracy with minimal language model reliance, especially effective on degraded or resource-poor language forms.
Contribution
The paper introduces a novel graph editing approach for form understanding that reduces dependence on large pre-trained language models, enabling effective processing of challenging form images.
Findings
Achieves state-of-the-art on the historical NAF dataset.
Performs comparably to large LM-based methods on FUNSD with only visual features.
Effective on degraded and resource-poor language forms.
Abstract
We address the problem of form understanding: finding text entities and the relationships/links between them in form images. The proposed FUDGE model formulates this problem on a graph of text elements (the vertices) and uses a Graph Convolutional Network to predict changes to the graph. The initial vertices are detected text lines and do not necessarily correspond to the final text entities, which can span multiple lines. Also, initial edges contain many false-positive relationships. FUDGE edits the graph structure by combining text segments (graph vertices) and pruning edges in an iterative fashion to obtain the final text entities and relationships. While recent work in this area has focused on leveraging large-scale pre-trained Language Models (LM), FUDGE achieves almost the same level of entity linking performance on the FUNSD dataset by learning only visual features from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
