SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images
Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang

TL;DR
SpecVQA introduces a comprehensive benchmark for evaluating multimodal models' ability to understand and reason about scientific spectral images, facilitating progress in scientific visual-language AI.
Contribution
The paper presents a new benchmark with expert-annotated QA pairs for spectral scientific images and proposes a data sampling method to improve model performance.
Findings
A spectral data sampling approach improves model accuracy.
SpecVQA covers 7 spectrum types with 3100 QA pairs.
Benchmark enables evaluation of scientific spectral understanding.
Abstract
Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-specific characteristics. Here we introduce SpecVQA, a professional scientific-image benchmark for evaluating multimodal models on scientific spectral understanding, covering 7 representative spectrum types with expert-annotated question-answer pairs. The aim comprises two aspects: spectra scientific QA evaluation and corresponding underlying task evaluation. SpecVQA contains 620 figures and 3100 QA pairs curated from peer-reviewed literature, targeting both direct information extraction and domain-specific reasoning. To effectively reduce token length while preserving essential curve characteristics, we propose a spectral data sampling and interpolation reconstruction approach. Ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
