Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts
Seon Gyeom Kim, Jae Young Choi, Ryan Rossi, Eunyee Koh, Tak Yeon Lee

TL;DR
This paper introduces Chart-to-Experience, a benchmark dataset for evaluating multimodal large language models' ability to predict the experiential impact of charts, revealing their strengths and limitations in comparison to human judgment.
Contribution
The paper presents a new benchmark dataset and evaluates state-of-the-art MLLMs on their ability to predict and compare the experiential impact of charts, highlighting gaps in current models.
Findings
MLLMs are less sensitive than humans in assessing individual charts.
MLLMs perform well in pairwise chart comparisons.
The benchmark reveals specific areas for improvement in MLLMs.
Abstract
The field of Multimodal Large Language Models (MLLMs) has made remarkable progress in visual understanding tasks, presenting a vast opportunity to predict the perceptual and emotional impact of charts. However, it also raises concerns, as many applications of LLMs are based on overgeneralized assumptions from a few examples, lacking sufficient validation of their performance and effectiveness. We introduce Chart-to-Experience, a benchmark dataset comprising 36 charts, evaluated by crowdsourced workers for their impact on seven experiential factors. Using the dataset as ground truth, we evaluated capabilities of state-of-the-art MLLMs on two tasks: direct prediction and pairwise comparison of charts. Our findings imply that MLLMs are not as sensitive as human evaluators when assessing individual charts, but are accurate and reliable in pairwise comparisons.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
