Discovering Failure Modes in Vision-Language Models using RL
Kanishk Jain, Qian Yang, Shravan Nayak, Parisa Kordjamshidi, Nishanth Anand, Aishwarya Agrawal

TL;DR
This paper introduces an RL-based framework that automatically uncovers failure modes in vision-language models by adaptively generating queries to reveal their weaknesses without human bias.
Contribution
The authors propose a novel reinforcement learning approach to automatically discover VLM vulnerabilities, reducing manual effort and uncovering subtle failure modes.
Findings
Framework effectively identifies new failure modes in VLMs.
Method generalizes across different model combinations.
Focuses on fine-grained visual details and skill composition.
Abstract
Vision-language Models (VLMs), despite achieving strong performance on multimodal benchmarks, often misinterpret straightforward visual concepts that humans identify effortlessly, such as counting, spatial reasoning, and viewpoint understanding. Previous studies manually identified these weaknesses and found that they often stem from deficits in specific skills. However, such manual efforts are costly, unscalable, and subject to human bias, which often overlooks subtle details in favour of salient objects, resulting in an incomplete understanding of a model's vulnerabilities. To address these limitations, we propose a Reinforcement Learning (RL)-based framework to automatically discover the failure modes or blind spots of any ``candidate VLM'' on a given data distribution without human intervention. Our framework trains a questioner agent that adaptively generates queries based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
