ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts
Ruiran Su, Jiasheng Si, Zhijiang Guo, Janet B. Pierrehumbert

TL;DR
ClimateViz is a large-scale benchmark dataset for scientific fact-checking on charts, revealing that current multimodal models struggle with chart-based reasoning and highlighting the need for improved AI systems.
Contribution
This paper introduces ClimateViz, the first comprehensive benchmark for scientific fact-checking on visual charts, including structured explanations and extensive evaluation of state-of-the-art models.
Findings
Models achieve only 76.2-77.8% accuracy, below human performance.
Explanation-augmented outputs can improve model performance.
Current models struggle with complex chart-based reasoning.
Abstract
Scientific fact-checking has mostly focused on text and tables, overlooking scientific charts, which are key for presenting quantitative evidence and statistical reasoning. We introduce ClimateViz, the first large-scale benchmark for scientific fact-checking using expert-curated scientific charts. ClimateViz contains 49,862 claims linked to 2,896 visualizations, each labeled as support, refute, or not enough information. To improve interpretability, each example includes structured knowledge graph explanations covering trends, comparisons, and causal relations. We evaluate state-of-the-art multimodal language models, including both proprietary and open-source systems, in zero-shot and few-shot settings. Results show that current models struggle with chart-based reasoning: even the best systems, such as Gemini 2.5 and InternVL 2.5, reach only 76.2 to 77.8 percent accuracy in label-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Data Visualization and Analytics
