VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding

Yihao Ding; Soyeon Caren Han; Yan Li; Josiah Poon

arXiv:2506.01388·cs.CV·June 3, 2025

VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding

Yihao Ding, Soyeon Caren Han, Yan Li, Josiah Poon

PDF

Open Access

TL;DR

This paper discusses the VRD-IU competition focused on extracting key information from complex, multi-format forms in visually rich documents, highlighting innovative methodologies and setting new benchmarks in document understanding.

Contribution

It introduces the VRD-IU competition and dataset, showcasing state-of-the-art approaches for key information extraction from complex forms, and provides insights into effective techniques.

Findings

01

Top models achieved new performance benchmarks.

02

Hierarchical and transformer-based methods proved effective.

03

Multimodal fusion improved information localization accuracy.

Abstract

Visually Rich Document Understanding (VRDU) has emerged as a critical field in document intelligence, enabling automated extraction of key information from complex documents across domains such as medical, financial, and educational applications. However, form-like documents pose unique challenges due to their complex layouts, multi-stakeholder involvement, and high structural variability. Addressing these issues, the VRD-IU Competition was introduced, focusing on extracting and localizing key information from multi-format forms within the Form-NLU dataset, which includes digital, printed, and handwritten documents. This paper presents insights from the competition, which featured two tracks: Track A, emphasizing entity-based key information retrieval, and Track B, targeting end-to-end key information localization from raw document images. With over 20 participating teams, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies