Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts
Azher Ahmed Efat, Seok Hwan Song, Wallapak Tavanapong

TL;DR
This paper introduces PolyChartQA, a new dataset for question answering on multi-chart images, highlighting the challenges and evaluating state-of-the-art models' performance in understanding complex visual data.
Contribution
The paper presents PolyChartQA, a benchmark dataset for multi-chart question answering, and assesses the performance of nine advanced multimodal language models on this task.
Findings
Models show a 27.4% accuracy drop on human-authored questions.
Prompting methods improve accuracy by 5.39%.
PolyChartQA enables evaluation of multi-chart understanding in AI models.
Abstract
Charts are widely used to present complex information. Deriving meaningful insights in real-world contexts often requires interpreting multiple related charts together. Research on understanding multi-chart images has not been extensively explored. We introduce PolyChartQA, a mid-scale dataset specifically designed for question answering over multi-chart images. PolyChartQA comprises 534 multi-chart images (with a total of 2,297 sub-charts) sourced from peer-reviewed computer science research publications and 2,694 QA pairs. We evaluate the performance of nine state-of-the-art Multimodal Language Models (MLMs) on PolyChartQA across question type, difficulty, question source, and key structural characteristics of multi-charts. Our results show a 27.4% LLM-based accuracy (L-Accuracy) drop on human-authored questions compared to MLM-generated questions, and a 5.39% L-accuracy gain with our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
