DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting
Md Mofijul Islam, Md Sirajus Salekin, Nivedha Balakrishnan, Vincil C. Bishop III, Niharika Jain, Spencer Romo, Bob Strahan, Boyi Xie, Diego A. Socolinsky

TL;DR
This paper introduces DocSplit, a comprehensive benchmark dataset and evaluation framework for the challenging task of document packet splitting, addressing real-world complexities and evaluating multimodal large language models.
Contribution
It provides the first extensive benchmark dataset and novel metrics for document packet splitting, formalizes the task, and evaluates current models' performance on complex document scenarios.
Findings
Significant performance gaps in current models' ability to split complex document packets.
The datasets cover diverse document types, layouts, and multimodal settings.
The benchmark facilitates future research in document understanding for various domains.
Abstract
Document understanding in real-world applications often requires processing heterogeneous, multi-page document packets containing multiple documents stitched together. Despite recent advances in visual document understanding, the fundamental task of document packet splitting, which involves separating a document packet into individual units, remains largely unaddressed. We present the first comprehensive benchmark dataset, DocSplit, along with novel evaluation metrics for assessing the document packet splitting capabilities of large language models. DocSplit comprises five datasets of varying complexity, covering diverse document types, layouts, and multimodal settings. We formalize the DocSplit task, which requires models to identify document boundaries, classify document types, and maintain correct page ordering within a document packet. The benchmark addresses real-world challenges,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Network Packet Processing and Optimization · Advanced Neural Network Applications
