SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang,, Pengfei Hu, Qing Wang, Jianshu Zhang

TL;DR
SRFUND introduces a comprehensive, multi-task benchmark with refined annotations and hierarchical structure recovery for form understanding across eight languages, advancing the analysis of complex document layouts.
Contribution
It provides a new hierarchical, multi-granularity dataset with detailed annotations and global structure dependencies, surpassing previous datasets limited to local annotations.
Findings
New challenges in handling diverse layouts and global hierarchies.
Enhanced cross-lingual form understanding capabilities.
Baseline methods demonstrate the dataset's complexity.
Abstract
Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage
