Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning

Rongjin Li; Zichen Tang; Xianghe Wang; Xinyi Hu; Zhengyu Wang; Zhengyu Lu; Yiling Huang; Jiayuan Chen; Weisheng Tan; Jiacheng Liu; Zhongjun Yang; Haihong E

arXiv:2603.28651·cs.AI·March 31, 2026

Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning

Rongjin Li, Zichen Tang, Xianghe Wang, Xinyi Hu, Zhengyu Wang, Zhengyu Lu, Yiling Huang, Jiayuan Chen, Weisheng Tan, Jiacheng Liu, Zhongjun Yang, Haihong E

PDF

1 Video

TL;DR

ScholScan introduces a comprehensive benchmark for evaluating multimodal large language models on scan-oriented academic paper reasoning, emphasizing full-document understanding and verification beyond relevance retrieval.

Contribution

This work presents ScholScan, a new benchmark with annotated questions, evidence localization, and reasoning traces to evaluate MLLMs on scan-oriented academic paper reasoning tasks.

Findings

01

MLLMs show systematic deficiencies on scan-oriented tasks.

02

Retrieval-augmented generation methods do not significantly improve performance.

03

ScholScan highlights the challenge of full-document understanding in current models.

Abstract

With the rapid progress of multimodal large language models (MLLMs), AI already performs well at literature retrieval and certain reasoning tasks, serving as a capable assistant to human researchers, yet it remains far from autonomous research. The fundamental reason is that current work on academic paper reasoning is largely confined to a search-oriented paradigm centered on pre-specified targets, with reasoning grounded in relevance retrieval, which struggles to support researcher-style full-document understanding, reasoning, and verification. To bridge this gap, we propose \textbf{ScholScan}, a new benchmark for academic paper reasoning. ScholScan introduces a scan-oriented task setting that asks models to read and cross-check entire papers like human researchers, scanning the document to identify consistency issues. The benchmark comprises 1,800 carefully annotated questions drawn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning· slideslive