CHAOS: Chart Analysis with Outlier Samples

Omar Moured; Yufan Chen; Ruiping Liu; Simon Rei{\ss}; Philip Torr; Jiaming Zhang; Rainer Stiefelhagen

arXiv:2505.17235·cs.CV·May 26, 2025

CHAOS: Chart Analysis with Outlier Samples

Omar Moured, Yufan Chen, Ruiping Liu, Simon Rei{\ss}, Philip Torr, Jiaming Zhang, Rainer Stiefelhagen

PDF

1 Datasets

TL;DR

CHAOS is a benchmark designed to evaluate the robustness of multimodal large language models in interpreting noisy and perturbed charts, covering various textual and visual disruptions at multiple severity levels.

Contribution

This work introduces CHAOS, a comprehensive benchmark for assessing MLLMs' robustness to chart perturbations, including a detailed analysis across different models and downstream tasks.

Findings

01

Models show varying robustness levels across perturbation types.

02

Chart-specific models outperform general models in noisy conditions.

03

Perturbation severity significantly impacts model performance.

Abstract

Charts play a critical role in data analysis and visualization, yet real-world applications often present charts with challenging or noisy features. However, "outlier charts" pose a substantial challenge even for Multimodal Large Language Models (MLLMs), which can struggle to interpret perturbed charts. In this work, we introduce CHAOS (CHart Analysis with Outlier Samples), a robustness benchmark to systematically evaluate MLLMs against chart perturbations. CHAOS encompasses five types of textual and ten types of visual perturbations, each presented at three levels of severity (easy, mid, hard) inspired by the study result of human evaluation. The benchmark includes 13 state-of-the-art MLLMs divided into three groups (i.e., general-, document-, and chart-specific models) according to the training scope and data. Comprehensive analysis involves two downstream tasks (ChartQA and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

omoured/CHAOS
dataset· 234 dl
234 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.