VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering

Yuyi Li; Daoyuan Chen; Zhen Wang; Yutong Lu; and Yaliang Li

arXiv:2511.19899·cs.CV·February 12, 2026

VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering

Yuyi Li, Daoyuan Chen, Zhen Wang, Yutong Lu, and Yaliang Li

PDF

Open Access 1 Datasets

TL;DR

VeriSciQA is a large, high-quality dataset for scientific visual question answering, created through a cross-modal verification framework that ensures accurate question-answer pairs from scientific figures and their citing paragraphs.

Contribution

The paper introduces VeriSciQA, a novel dataset for SVQA, generated using a verification framework that filters out errors, improving data quality for scientific visual reasoning tasks.

Findings

01

Models fine-tuned on VeriSciQA outperform those trained on previous datasets.

02

There is a significant accuracy gap between open-source and proprietary models on SVQA.

03

Scaling data with VeriSciQA enhances model performance on SVQA benchmarks.

Abstract

Large Vision-Language Models (LVLMs) show promise for scientific applications, yet open-source models still struggle with Scientific Visual Question Answering (SVQA), namely answering questions about figures from scientific papers. A key bottleneck is the lack of public, large-scale, high-quality SVQA datasets. Although recent work uses LVLMs to synthesize data at scale, we identify systematic errors in their resulting QA pairs, stemming from LVLMs' inherent limitations and information asymmetry between figures and text. To address these challenges, we propose a Cross-Modal verification framework that generates questions and answers purely from figure-citing paragraphs, then verifies them against the figures themselves, leveraging the inherent text-figure alignment in scientific papers to filter out erroneous QA pairs. We instantiate this framework to curate VeriSciQA, a dataset of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

datajuicer/VeriSciQA
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques