SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation

Longteng Guo; Xuanxu Lin; Dongze Hao; Tongtian Yue; Pengkang Huo; Jiatong Ma; Yuchen Liu; Jing Liu

arXiv:2605.10187·cs.CV·May 14, 2026

SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation

Longteng Guo, Xuanxu Lin, Dongze Hao, Tongtian Yue, Pengkang Huo, Jiatong Ma, Yuchen Liu, Jing Liu

PDF

1 Repo

TL;DR

SciVQR is a comprehensive multimodal benchmark designed to evaluate advanced scientific reasoning across multiple disciplines, emphasizing complex, multi-step inference and reasoning traceability in large language models.

Contribution

Introduces SciVQR, a new multidisciplinary multimodal benchmark with expert solutions, to better evaluate and understand scientific reasoning in large language models.

Findings

01

Leading MLLMs show significant limitations in complex reasoning tasks.

02

SciVQR reveals gaps in models' ability to handle interdisciplinary scientific visuals.

03

Benchmark encourages development of models with improved multi-step reasoning.

Abstract

Scientific reasoning is a key aspect of human intelligence, requiring the integration of multimodal inputs, domain expertise, and multi-step inference across various subjects. Existing benchmarks for multimodal large language models (MLLMs) often fail to capture the complexity and traceability of reasoning processes necessary for rigorous evaluation. To fill this gap, we introduce SciVQR, a multimodal benchmark covering 54 subfields in mathematics, physics, chemistry, geography, astronomy, and biology. SciVQR includes domain-specific visuals, such as equations, charts, and diagrams, and challenges models to combine visual comprehension with reasoning. The tasks range from basic factual recall to complex, multi-step inferences, with 46% including expert-authored solutions. SciVQR not only evaluates final answers but also examines the reasoning process, providing insights into how models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CASIA-IVA-Lab/SciVQR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.