SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction   Benchmark in Form Understanding

Jiefeng Ma; Yan Wang; Chenyu Liu; Jun Du; Yu Hu; Zhenrong Zhang,; Pengfei Hu; Qing Wang; Jianshu Zhang

arXiv:2406.08757·cs.CL·June 14, 2024

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang,, Pengfei Hu, Qing Wang, Jianshu Zhang

PDF

Open Access 1 Video

TL;DR

SRFUND introduces a comprehensive, multi-task benchmark with refined annotations and hierarchical structure recovery for form understanding across eight languages, advancing the analysis of complex document layouts.

Contribution

It provides a new hierarchical, multi-granularity dataset with detailed annotations and global structure dependencies, surpassing previous datasets limited to local annotations.

Findings

01

New challenges in handling diverse layouts and global hierarchies.

02

Enhanced cross-lingual form understanding capabilities.

03

Baseline methods demonstrate the dataset's complexity.

Abstract

Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding· slideslive

Taxonomy

TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage