Revealing Interpretable Failure Modes of VLMs
Isha Chaudhary, Vedaant V Jain, Kavya Sachdeva, Sayan Ranu, Gagandeep Singh

TL;DR
REVELIO is a framework that systematically uncovers interpretable failure modes in vision-language models, revealing vulnerabilities in autonomous driving and indoor robotics.
Contribution
It introduces a novel search-based approach combining diversity-aware beam search and Gaussian-process Thompson Sampling to identify failure modes.
Findings
Uncovered previously unreported vulnerabilities in state-of-the-art VLMs.
Identified failure modes such as weak spatial grounding and safety hazard misses.
Provided actionable insights for targeted safety improvements.
Abstract
Vision-Language Models (VLMs) are increasingly used in safety-critical applications because of their broad reasoning capabilities and ability to generalize with minimal task-specific engineering. Despite these advantages, they can exhibit catastrophic failures in specific real-world situations, constituting failure modes. We introduce REVELIO, a framework for systematically uncovering interpretable failure modes in VLMs. We define a failure mode as a composition of interpretable, domain-relevant concepts-such as pedestrian proximity or adverse weather conditions-under which a target VLM consistently behaves incorrectly. Identifying such failures requires searching over an exponentially large discrete combinatorial space. To address this challenge, REVELIO combines two search procedures: a diversity-aware beam search that efficiently maps the failure landscape, and a Gaussian-process…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
