Visualization Literacy of Multimodal Large Language Models: A Comparative Study
Zhimin Li, Haichao Miao, Valerio Pascucci, Shusen Liu

TL;DR
This study evaluates multimodal large language models' visualization literacy, revealing their competitive performance and surpassing humans in specific visualization understanding tasks like identifying correlations and hierarchical structures.
Contribution
It introduces a visualization literacy evaluation framework for MLLMs and compares multiple models against human baselines, highlighting their strengths and limitations.
Findings
MLLMs outperform humans in identifying correlations and hierarchical structures.
MLLMs demonstrate competitive visualization literacy performance.
The study provides a new benchmark for assessing MLLMs' visualization understanding.
Abstract
The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualization results and explain the content of the visualization to users in natural language. In the machine learning community, the general vision capabilities of MLLMs have been evaluated and tested through various visual understanding benchmarks. However, the ability of MLLMs to accomplish specific visualization tasks based on visual perception has not been properly explored and evaluated, particularly, from a visualization-centric perspective. In this work, we aim to fill the gap by utilizing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
