VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document   Information Extraction

Thanh-Dat Nguyen; Tung Do-Viet; Hung Nguyen-Duy; Tuan-Hai Luu; Hung; Le; Bach Le; and Patanamon (Pick) Thongtanunam

arXiv:2407.06826·cs.AI·July 10, 2024

VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Thanh-Dat Nguyen, Tung Do-Viet, Hung Nguyen-Duy, Tuan-Hai Luu, Hung, Le, Bach Le, and Patanamon (Pick) Thongtanunam

PDF

Open Access

TL;DR

VRDSynth is a novel program synthesis approach for extracting entity relations from multilingual visually rich documents without pre-training, outperforming state-of-the-art models in accuracy and efficiency.

Contribution

We introduce VRDSynth, a domain-specific language and synthesis algorithm for multilingual VRD information extraction, eliminating the need for pre-training data.

Findings

01

Outperforms pre-trained models in 5-7 out of 8 languages.

02

Improves F1 score by 42% over LayoutXLM in English.

03

Reduces memory footprint significantly while maintaining efficiency.

Abstract

Businesses need to query visually rich documents (VRDs) like receipts, medical records, and insurance forms to make decisions. Existing techniques for extracting entities from VRDs struggle with new layouts or require extensive pre-training data. We introduce VRDSynth, a program synthesis method to automatically extract entity relations from multilingual VRDs without pre-training data. To capture the complexity of VRD domain, we design a domain-specific language (DSL) to capture spatial and textual relations to describe the synthesized programs. Along with this, we also derive a new synthesis algorithm utilizing frequent spatial relations, search space pruning, and a combination of positive, negative, and exclusive programs to improve coverage. We evaluate VRDSynth on the FUNSD and XFUND benchmarks for semantic entity linking, consisting of 1,592 forms in 8 languages. VRDSynth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Handwritten Text Recognition Techniques