DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

Aaryaman Kartha; Ahmed Masry; Mohammed Saidul Islam; Thinh Lang; Shadikur Rahman; Ridwan Mahbub; Mizanur Rahman; Mahir Ahmed; Md Rizwan Parvez; Enamul Hoque; Shafiq Joty

arXiv:2508.17398·cs.CL·August 26, 2025

DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

Aaryaman Kartha, Ahmed Masry, Mohammed Saidul Islam, Thinh Lang, Shadikur Rahman, Ridwan Mahbub, Mizanur Rahman, Mahir Ahmed, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty

PDF

1 Datasets 1 Video

TL;DR

DashboardQA introduces a new benchmark for evaluating multimodal agents' ability to understand and interact with real-world, interactive dashboards, highlighting current limitations and challenges in this emerging field.

Contribution

This paper presents the first benchmark specifically designed to assess GUI agents' comprehension and interaction capabilities with real-world dashboards, filling a critical gap in existing QA benchmarks.

Findings

01

All evaluated models perform poorly, with top accuracy around 39%.

02

Current models struggle with grounding, planning, and reasoning on dashboards.

03

Interactive dashboard reasoning remains a significant challenge for vision-language models.

Abstract

Dashboards are powerful visualization tools for data-driven decision-making, integrating multiple interactive views that allow users to explore, filter, and navigate data. Unlike static charts, dashboards support rich interactivity, which is essential for uncovering insights in real-world analytical workflows. However, existing question-answering benchmarks for data visualizations largely overlook this interactivity, focusing instead on static charts. This limitation severely constrains their ability to evaluate the capabilities of modern multimodal agents designed for GUI-based reasoning. To address this gap, we introduce DashboardQA, the first benchmark explicitly designed to assess how vision-language GUI agents comprehend and interact with real-world dashboards. The benchmark includes 112 interactive dashboards from Tableau Public and 405 question-answer pairs with interactive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ahmed-masry/DashboardQA
dataset· 49 dl
49 dl

Videos

DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards· underline