FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures
Jiajun Xu, Jiageng Mao, Ang Qi, Weiduo Yuan, Alexander Romanus, Helen Xia, Vitor Campagnolo Guizilini, and Yue Wang

TL;DR
This paper introduces FuzzingRL, a reinforcement learning-based fuzz testing method that generates challenging questions to expose vulnerabilities in vision-language models, significantly reducing their accuracy and transferring across models.
Contribution
It presents a novel reinforcement fuzzing approach that automatically creates adversarial queries to reveal VLM failures, with transferable policies across different models.
Findings
Decreased VLM accuracy from 86.58% to 65.53% after four RL iterations
Fuzzing policy trained on one VLM transfers effectively to others
Generated queries significantly degrade multiple VLMs' performance
Abstract
Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is critical for ensuring the reliability and safety of AI systems. In this paper, we propose an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities. The core of this approach lies in fuzz testing and reinforcement finetuning: we transform a single input query into a large set of diverse variants through vision and language fuzzing. Based on the fuzzing outcomes, the question generator is further instructed by adversarial reinforcement fine-tuning to produce increasingly challenging queries that trigger model failures. With this approach, we can consistently drive down a target VLM's answer accuracy -- for example, the accuracy of Qwen2.5-VL-32B on our generated questions drops from 86.58\% to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Topic Modeling
