ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering
Jingxuan Wei, Nan Xu, Junnan Zhu, Yanni Hao, Gaowei Wu, Bihui Yu, Lei Wang

TL;DR
ChartMind introduces a comprehensive benchmark for complex real-world multimodal chart question answering, emphasizing diverse tasks, multilingual contexts, and open-domain outputs to better evaluate vision-language models.
Contribution
The paper presents ChartMind, a new benchmark covering diverse real-world chart analysis tasks and a model-agnostic framework, ChartLLM, for improved reasoning in multimodal models.
Findings
ChartMind covers seven task categories and multilingual contexts.
ChartLLM significantly outperforms existing CQA paradigms.
Flexible chart understanding enhances real-world reasoning accuracy.
Abstract
Chart question answering (CQA) has become a critical multimodal task for evaluating the reasoning capabilities of vision-language models. While early approaches have shown promising performance by focusing on visual features or leveraging large-scale pre-training, most existing evaluations rely on rigid output formats and objective metrics, thus ignoring the complex, real-world demands of practical chart analysis. In this paper, we introduce ChartMind, a new benchmark designed for complex CQA tasks in real-world settings. ChartMind covers seven task categories, incorporates multilingual contexts, supports open-domain textual outputs, and accommodates diverse chart formats, bridging the gap between real-world applications and traditional academic benchmarks. Furthermore, we propose a context-aware yet model-agnostic framework, ChartLLM, that focuses on extracting key contextual elements,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
