CHARTOM: A Visual Theory-of-Mind Benchmark for LLMs on Misleading Charts

Shubham Bharti; Shiyun Cheng; Jihyun Rho; Jianrui Zhang; Mu Cai; Yong Jae Lee; Martina Rau; Xiaojin Zhu

arXiv:2408.14419·cs.AI·July 1, 2025

CHARTOM: A Visual Theory-of-Mind Benchmark for LLMs on Misleading Charts

Shubham Bharti, Shiyun Cheng, Jihyun Rho, Jianrui Zhang, Mu Cai, Yong Jae Lee, Martina Rau, Xiaojin Zhu

PDF

Open Access 1 Repo

TL;DR

CHARTOM is a new benchmark for evaluating multimodal large language models' ability to understand and identify misleading visual data in charts, highlighting current limitations and guiding future improvements.

Contribution

The paper introduces CHARTOM, a novel visual theory-of-mind benchmark for LLMs to assess understanding of misleading charts and societal implications.

Findings

01

Current LLMs struggle with FACT and MIND questions on CHARTOM

02

CHARTOM reveals limitations in models' understanding of misleading visualizations

03

Benchmark provides a calibration method based on human performance

Abstract

We introduce CHARTOM, a visual theory-of-mind benchmark designed to evaluate multimodal large language models' capability to understand and reason about misleading data visualizations though charts. CHARTOM consists of carefully designed charts and associated questions that require a language model to not only correctly comprehend the factual content in the chart (the FACT question) but also judge whether the chart will be misleading to a human readers (the MIND question), a dual capability with significant societal benefits. We detail the construction of our benchmark including its calibration on human performance and estimation of MIND ground truth called the Human Misleadingness Index. We evaluated several leading LLMs -- including GPT, Claude, Gemini, Qwen, Llama, and Llava series models -- on the CHARTOM dataset and found that it was challenging to all models both on FACT and MIND…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ukplab/arxiv2025-misleading-visualizations
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Weight Decay · Adam · Multi-Head Attention