Chart Question Answering from Real-World Analytical Narratives
Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, Pranava Madhyastha

TL;DR
This paper introduces a new dataset for chart question answering based on real-world analytical narratives, highlighting the challenges of authentic reasoning in multimodal models.
Contribution
It provides a novel, ecologically valid dataset from visualization notebooks and benchmarks current models, revealing significant performance gaps.
Findings
GPT-4.1 achieves 69.3% accuracy on the dataset.
The dataset reflects real-world reasoning workflows.
Current models struggle with authentic CQA tasks.
Abstract
We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Data Visualization and Analytics
