ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

Rongtian Ye

arXiv:2603.28902·cs.AI·May 12, 2026

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

Rongtian Ye

PDF

1 Datasets

TL;DR

ChartDiff is a large-scale benchmark designed to evaluate models' ability to perform cross-chart comparative summarization, addressing a gap in existing chart understanding benchmarks.

Contribution

We introduce ChartDiff, the first extensive benchmark for multi-chart comparison, with diverse chart pairs and annotations, to evaluate and improve chart reasoning models.

Findings

01

General-purpose models achieve high GPT-based quality.

02

Specialized models score higher on ROUGE but lower on human judgment.

03

Multi-series charts remain challenging for current models.

Abstract

Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively on single-chart interpretation rather than comparative reasoning across multiple charts. To address this gap, we introduce ChartDiff, the first large-scale benchmark for cross-chart comparative summarization. ChartDiff consists of 8,541 chart pairs spanning diverse data sources, chart types, and visual styles, each annotated with LLM-generated and human-verified summaries describing differences in trends, fluctuations, and anomalies. Using ChartDiff, we evaluate general-purpose, chart-specialized, and pipeline-based models. Our results show that frontier general-purpose models achieve the highest GPT-based quality, while specialized and pipeline-based methods obtain higher ROUGE scores but lower human-aligned evaluation, revealing a clear mismatch between lexical overlap…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ckchaos/ChartDiff
dataset· 686 dl
686 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.