Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric

Xiangjie Sui; Songyang Li; Hanwei Zhu; Baoliang Chen; Yuming Fang; Xin Sun

arXiv:2511.19032·cs.CV·November 25, 2025

Benchmarking Corruption Robustness of LVLMs: A Discriminative Benchmark and Robustness Alignment Metric

Xiangjie Sui, Songyang Li, Hanwei Zhu, Baoliang Chen, Yuming Fang, Xin Sun

PDF

Open Access

TL;DR

This paper introduces Bench-C, a discriminative benchmark and Robustness Alignment Score to evaluate and analyze the corruption robustness of large vision-language models, revealing nuanced failure and recovery patterns.

Contribution

The paper proposes a new benchmark and metric for assessing corruption robustness in LVLMs, addressing limitations of existing evaluation methods.

Findings

01

Models show distinct behaviors under corruptions, such as confidence errors.

02

Subtle corruptions can slightly improve accuracy but degrade prediction structure.

03

Decomposing robustness reveals different failure and recovery patterns.

Abstract

Despite the remarkable reasoning abilities of large vision-language models (LVLMs), their robustness under visual corruptions remains insufficiently studied. Existing evaluation paradigms exhibit two major limitations: 1) the dominance of low-discriminative samples in current datasets masks the real robustness gap between models; and 2) conventional accuracy-based metric fail to capture the degradation of the underlying prediction structure. To bridge these gaps, we introduce Bench-C, a comprehensive benchmark emphasizing discriminative samples for assessing corruption robustness, where a selection strategy is proposed to jointly consider the prediction inconsistency under corruption and the semantic diversity. Furthermore, we propose the Robustness Alignment Score (RAS), a unified metric that measures degradation in logit-level prediction structure by considering the shifts in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications