SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Jialu Shen; Han Lyu; Suyang Zhong; Hanzheng Li; Haoyi Tao; Nan Wang; Changhong Chen; Xi Fang

arXiv:2604.28039·cs.AI·May 1, 2026

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang

PDF

1 Datasets

TL;DR

SpecVQA introduces a comprehensive benchmark for evaluating multimodal models' ability to understand and reason about scientific spectral images, facilitating progress in scientific visual-language AI.

Contribution

The paper presents a new benchmark with expert-annotated QA pairs for spectral scientific images and proposes a data sampling method to improve model performance.

Findings

01

A spectral data sampling approach improves model accuracy.

02

SpecVQA covers 7 spectrum types with 3100 QA pairs.

03

Benchmark enables evaluation of scientific spectral understanding.

Abstract

Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal models on scientific spectral understanding, covering 7 representative spectrum types with expert-annotated question-answer pairs. The aim comprises two aspects: spectra scientific QA evaluation and corresponding underlying task evaluation. SpecVQA contains 620 figures and 3100 QA pairs curated from peer-reviewed literature, targeting both direct information extraction and domain-specific reasoning. To effectively reduce token length while preserving essential curve characteristics, we propose a spectral data sampling and interpolation reconstruction approach. Ablation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

UniParser/SpecVQA
dataset· 313 dl
313 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.