SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents
Jaehoon Lee, Sohyun Kim, Wanggeun Park, Geon Lee, Seungkyung Kim, Minyoung Lee

TL;DR
This paper introduces SDS KoPub VDR, a large-scale benchmark dataset for visual document retrieval in Korean public documents, addressing language and structural complexity gaps in existing VDR benchmarks.
Contribution
It provides the first comprehensive Korean public document dataset with multimodal queries and human-verified annotations for evaluating VDR models.
Findings
Significant performance gaps in multimodal retrieval tasks.
State-of-the-art models struggle with cross-modal reasoning.
The dataset enables detailed evaluation of document understanding models.
Abstract
Existing benchmarks for visual document retrieval (VDR) largely overlook non-English languages and the structural complexity of official publications. To address this gap, we introduce SDS KoPub VDR, the first large-scale, public benchmark for retrieving and understanding Korean public documents. The benchmark is built upon 361 real-world documents, including 256 files under the KOGL Type 1 license and 105 from official legal portals, capturing complex visual elements like tables, charts, and multi-column layouts. To establish a reliable evaluation set, we constructed 600 query-page-answer triples. These were initially generated using multimodal models (e.g., GPT-4o) and subsequently underwent human verification to ensure factual accuracy and contextual relevance. The queries span six major public domains and are categorized by the reasoning modality required: text-based, visual-based,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques
