DocHop-QA: Towards Multi-Hop Reasoning over Multimodal Document Collections
Jiwon Park, Seohyun Pyeon, Jinwoo Kim, Rina Carines Cabal, Yihao Ding, Soyeon Caren Han

TL;DR
DocHop-QA is a large-scale, multimodal, multi-document question answering benchmark based on scientific literature, designed to challenge and evaluate multi-hop reasoning capabilities of models across diverse formats and complex reasoning paths.
Contribution
It introduces a novel, domain-agnostic dataset with 11,379 instances, supporting open-ended, multimodal, multi-hop reasoning without relying on hyperlinks, and provides a comprehensive evaluation framework.
Findings
Demonstrates models' ability to perform complex multimodal reasoning
Shows the dataset's effectiveness in evaluating multi-hop QA
Highlights the need for advanced reasoning in scientific domains
Abstract
Despite recent advances in large language models (LLMs), most QA benchmarks are still confined to single-paragraph or single-document settings, failing to capture the complexity of real-world information-seeking tasks. Practical QA often requires multi-hop reasoning over information distributed across multiple documents, modalities, and structural formats. Although prior datasets made progress in this area, they rely heavily on Wikipedia-based content and unimodal plain text, with shallow reasoning paths that typically produce brief phrase-level or single-sentence answers, thus limiting their realism and generalizability. We propose DocHop-QA, a large-scale benchmark comprising 11,379 QA instances for multimodal, multi-document, multi-hop question answering. Constructed from publicly available scientific documents sourced from PubMed, DocHop-QA is domain-agnostic and incorporates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Biomedical Text Mining and Ontologies
