Revealing Interpretable Failure Modes of VLMs

Isha Chaudhary; Vedaant V Jain; Kavya Sachdeva; Sayan Ranu; Gagandeep Singh

arXiv:2605.12674·cs.AI·May 14, 2026

Revealing Interpretable Failure Modes of VLMs

Isha Chaudhary, Vedaant V Jain, Kavya Sachdeva, Sayan Ranu, Gagandeep Singh

PDF

TL;DR

REVELIO is a framework that systematically uncovers interpretable failure modes in vision-language models, revealing vulnerabilities in autonomous driving and indoor robotics.

Contribution

It introduces a novel search-based approach combining diversity-aware beam search and Gaussian-process Thompson Sampling to identify failure modes.

Findings

01

Uncovered previously unreported vulnerabilities in state-of-the-art VLMs.

02

Identified failure modes such as weak spatial grounding and safety hazard misses.

03

Provided actionable insights for targeted safety improvements.

Abstract

Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to generalize with minimal task-specific engineering. Despite these advantages, they can exhibit catastrophic failures in specific real-world situations, constituting failure modes. We introduce REVELIO, a framework for systematically uncovering interpretable failure modes in VLMs. We define a failure mode as a composition of interpretable, domain-relevant concepts-such as pedestrian proximity or adverse weather conditions-under which a target VLM consistently behaves incorrectly. Identifying such failures requires searching over an exponentially large discrete combinatorial space. To address this challenge, REVELIO combines two search procedures: a diversity-aware beam search that efficiently maps the failure landscape, and a Gaussian-process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.