PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

Yanjun Zhao; Tianxin Wei; Jiaru Zou; Xuying Ning; Yuanchen Bei; Lingjie Chen; Simmi Rana; Wendy H. Yang; Hanghang Tong; Jingrui He

arXiv:2604.21304·cs.IR·April 29, 2026

PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs

Yanjun Zhao, Tianxin Wei, Jiaru Zou, Xuying Ning, Yuanchen Bei, Lingjie Chen, Simmi Rana, Wendy H. Yang, Hanghang Tong, Jingrui He

PDF

1 Repo 1 Datasets

TL;DR

PaperMind is a comprehensive benchmark designed to evaluate integrated agentic reasoning and critique abilities of multimodal LLMs over scientific papers across multiple domains.

Contribution

It introduces a unified benchmark with diverse tasks to assess complex scientific reasoning and critique in multimodal language models.

Findings

01

Models show consistent performance gaps across tasks.

02

Existing models struggle with integrated scientific reasoning.

03

Benchmark reveals persistent challenges in multimodal scientific understanding.

Abstract

Understanding scientific papers requires more than answering isolated questions or summarizing content. It involves an integrated reasoning process that grounds textual and visual information, interprets experimental evidence, synthesizes information across sources, and critically evaluates scientific claims. However, existing benchmarks typically assess these abilities in isolation, making it difficult to evaluate scientific paper understanding as a unified set of interacting cognitive abilities. In this work, we introduce PaperMind, a benchmark designed to evaluate integrated and agent-oriented scientific reasoning over research papers. PaperMind is constructed from real scientific papers across seven domains, including agriculture, biology, chemistry, computer science, medicine, physics, and economics. It comprises four complementary task families that collectively operationalize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yanjun-Zhao/PaperMind
github

Datasets

yj-zhao/PaperMind
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.