CHARTOM: A Visual Theory-of-Mind Benchmark for LLMs on Misleading Charts
Shubham Bharti, Shiyun Cheng, Jihyun Rho, Jianrui Zhang, Mu Cai, Yong Jae Lee, Martina Rau, Xiaojin Zhu

TL;DR
CHARTOM is a new benchmark for evaluating multimodal large language models' ability to understand and identify misleading visual data in charts, highlighting current limitations and guiding future improvements.
Contribution
The paper introduces CHARTOM, a novel visual theory-of-mind benchmark for LLMs to assess understanding of misleading charts and societal implications.
Findings
Current LLMs struggle with FACT and MIND questions on CHARTOM
CHARTOM reveals limitations in models' understanding of misleading visualizations
Benchmark provides a calibration method based on human performance
Abstract
We introduce CHARTOM, a visual theory-of-mind benchmark designed to evaluate multimodal large language models' capability to understand and reason about misleading data visualizations though charts. CHARTOM consists of carefully designed charts and associated questions that require a language model to not only correctly comprehend the factual content in the chart (the FACT question) but also judge whether the chart will be misleading to a human readers (the MIND question), a dual capability with significant societal benefits. We detail the construction of our benchmark including its calibration on human performance and estimation of MIND ground truth called the Human Misleadingness Index. We evaluated several leading LLMs -- including GPT, Claude, Gemini, Qwen, Llama, and Llava series models -- on the CHARTOM dataset and found that it was challenging to all models both on FACT and MIND…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Weight Decay · Adam · Multi-Head Attention
