VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding
Yihao Ding, Soyeon Caren Han, Yan Li, Josiah Poon

TL;DR
This paper discusses the VRD-IU competition focused on extracting key information from complex, multi-format forms in visually rich documents, highlighting innovative methodologies and setting new benchmarks in document understanding.
Contribution
It introduces the VRD-IU competition and dataset, showcasing state-of-the-art approaches for key information extraction from complex forms, and provides insights into effective techniques.
Findings
Top models achieved new performance benchmarks.
Hierarchical and transformer-based methods proved effective.
Multimodal fusion improved information localization accuracy.
Abstract
Visually Rich Document Understanding (VRDU) has emerged as a critical field in document intelligence, enabling automated extraction of key information from complex documents across domains such as medical, financial, and educational applications. However, form-like documents pose unique challenges due to their complex layouts, multi-stakeholder involvement, and high structural variability. Addressing these issues, the VRD-IU Competition was introduced, focusing on extracting and localizing key information from multi-format forms within the Form-NLU dataset, which includes digital, printed, and handwritten documents. This paper presents insights from the competition, which featured two tracks: Track A, emphasizing entity-based key information retrieval, and Track B, targeting end-to-end key information localization from raw document images. With over 20 participating teams, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
