ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering

Jingxuan Wei; Nan Xu; Junnan Zhu; Yanni Hao; Gaowei Wu; Bihui Yu; Lei Wang

arXiv:2505.23242·cs.CL·May 30, 2025

ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering

Jingxuan Wei, Nan Xu, Junnan Zhu, Yanni Hao, Gaowei Wu, Bihui Yu, Lei Wang

PDF

Open Access 1 Video

TL;DR

ChartMind introduces a comprehensive benchmark for complex real-world multimodal chart question answering, emphasizing diverse tasks, multilingual contexts, and open-domain outputs to better evaluate vision-language models.

Contribution

The paper presents ChartMind, a new benchmark covering diverse real-world chart analysis tasks and a model-agnostic framework, ChartLLM, for improved reasoning in multimodal models.

Findings

01

ChartMind covers seven task categories and multilingual contexts.

02

ChartLLM significantly outperforms existing CQA paradigms.

03

Flexible chart understanding enhances real-world reasoning accuracy.

Abstract

Chart question answering (CQA) has become a critical multimodal task for evaluating the reasoning capabilities of vision-language models. While early approaches have shown promising performance by focusing on visual features or leveraging large-scale pre-training, most existing evaluations rely on rigid output formats and objective metrics, thus ignoring the complex, real-world demands of practical chart analysis. In this paper, we introduce ChartMind, a new benchmark designed for complex CQA tasks in real-world settings. ChartMind covers seven task categories, incorporates multilingual contexts, supports open-domain textual outputs, and accommodates diverse chart formats, bridging the gap between real-world applications and traditional academic benchmarks. Furthermore, we propose a context-aware yet model-agnostic framework, ChartLLM, that focuses on extracting key contextual elements,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning