DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng, Zhang, Hai Zhao, Dong Yu

TL;DR
DocBench is a comprehensive benchmark designed to evaluate the performance of LLM-based document reading systems across diverse real-world scenarios, highlighting current gaps and guiding future improvements.
Contribution
This paper introduces DocBench, the first standardized benchmark for assessing LLM-based document reading systems with real documents and synthetic questions across multiple domains.
Findings
Existing systems lag behind human performance.
Significant challenges remain in multi-modal and long-context understanding.
Benchmark reveals gaps and guides future research.
Abstract
Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple reading comprehension tasks. Consequently, these systems have been carefully designed to tackle challenges such as file parsing, metadata extraction, multi-modal information understanding and long-context reading. However, no current benchmark exists to evaluate their performance in such scenarios, where a raw file and questions are provided as input, and a corresponding response is expected as output. In this paper, we introduce DocBench, a new benchmark designed to evaluate LLM-based document reading systems. Our benchmark involves a meticulously crafted process, including the recruitment of human annotators and the generation of synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Semantic Web and Ontologies
