CHAOS: Chart Analysis with Outlier Samples
Omar Moured, Yufan Chen, Ruiping Liu, Simon Rei{\ss}, Philip Torr, Jiaming Zhang, Rainer Stiefelhagen

TL;DR
CHAOS is a benchmark designed to evaluate the robustness of multimodal large language models in interpreting noisy and perturbed charts, covering various textual and visual disruptions at multiple severity levels.
Contribution
This work introduces CHAOS, a comprehensive benchmark for assessing MLLMs' robustness to chart perturbations, including a detailed analysis across different models and downstream tasks.
Findings
Models show varying robustness levels across perturbation types.
Chart-specific models outperform general models in noisy conditions.
Perturbation severity significantly impacts model performance.
Abstract
Charts play a critical role in data analysis and visualization, yet real-world applications often present charts with challenging or noisy features. However, "outlier charts" pose a substantial challenge even for Multimodal Large Language Models (MLLMs), which can struggle to interpret perturbed charts. In this work, we introduce CHAOS (CHart Analysis with Outlier Samples), a robustness benchmark to systematically evaluate MLLMs against chart perturbations. CHAOS encompasses five types of textual and ten types of visual perturbations, each presented at three levels of severity (easy, mid, hard) inspired by the study result of human evaluation. The benchmark includes 13 state-of-the-art MLLMs divided into three groups (i.e., general-, document-, and chart-specific models) according to the training scope and data. Comprehensive analysis involves two downstream tasks (ChartQA and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
