What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets

Jill P. Naiman; Daniel J. Evans; JooYoung Seo

arXiv:2601.22218·cs.CV·February 2, 2026

What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets

Jill P. Naiman, Daniel J. Evans, JooYoung Seo

PDF

Open Access 2 Datasets

TL;DR

This paper advocates for a new distribution-based VQA benchmark focused on scientific charts, emphasizing the importance of underlying data understanding rather than surface-level visual features, and provides a synthetic dataset for research.

Contribution

It introduces a novel VQA dataset for scientific charts that incorporates underlying data, addressing limitations of existing datasets that lack data-driven reasoning.

Findings

01

Generated synthetic histogram charts with ground truth data

02

Human and model question-answering on charts requiring data access

03

Open-source dataset with figures, data, and annotations

Abstract

Visual Question Answering (VQA) has become an important benchmark for assessing how large multimodal models (LMMs) interpret images. However, most VQA datasets focus on real-world images or simple diagrammatic analysis, with few focused on interpreting complex scientific charts. Indeed, many VQA datasets that analyze charts do not contain the underlying data behind those charts or assume a 1-to-1 correspondence between chart marks and underlying data. In reality, charts are transformations (i.e. analysis, simplification, modification) of data. This distinction introduces a reasoning challenge in VQA that the current datasets do not capture. In this paper, we argue for a dedicated VQA benchmark for scientific charts where there is no 1-to-1 correspondence between chart marks and underlying data. To do so, we survey existing VQA datasets and highlight limitations of the current field. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling