Visual FUDGE: Form Understanding via Dynamic Graph Editing

Brian Davis; Bryan Morse; Brian Price; Chris Tensmeyer; Curtis; Wiginton

arXiv:2105.08194·cs.CV·July 19, 2021

Visual FUDGE: Form Understanding via Dynamic Graph Editing

Brian Davis, Bryan Morse, Brian Price, Chris Tensmeyer, Curtis, Wiginton

PDF

3 Repos

TL;DR

FUDGE is a graph-based model that dynamically edits text entity graphs in form images, achieving high accuracy with minimal language model reliance, especially effective on degraded or resource-poor language forms.

Contribution

The paper introduces a novel graph editing approach for form understanding that reduces dependence on large pre-trained language models, enabling effective processing of challenging form images.

Findings

01

Achieves state-of-the-art on the historical NAF dataset.

02

Performs comparably to large LM-based methods on FUNSD with only visual features.

03

Effective on degraded and resource-poor language forms.

Abstract

We address the problem of form understanding: finding text entities and the relationships/links between them in form images. The proposed FUDGE model formulates this problem on a graph of text elements (the vertices) and uses a Graph Convolutional Network to predict changes to the graph. The initial vertices are detected text lines and do not necessarily correspond to the final text entities, which can span multiple lines. Also, initial edges contain many false-positive relationships. FUDGE edits the graph structure by combining text segments (graph vertices) and pruning edges in an iterative fashion to obtain the final text entities and relationships. While recent work in this area has focused on leveraging large-scale pre-trained Language Models (LM), FUDGE achieves almost the same level of entity linking performance on the FUNSD dataset by learning only visual features from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning