FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

Jiajun Xu; Jiageng Mao; Ang Qi; Weiduo Yuan; Alexander Romanus; Helen Xia; Vitor Campagnolo Guizilini; and Yue Wang

arXiv:2603.06600·cs.LG·March 10, 2026

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

Jiajun Xu, Jiageng Mao, Ang Qi, Weiduo Yuan, Alexander Romanus, Helen Xia, Vitor Campagnolo Guizilini, and Yue Wang

PDF

Open Access

TL;DR

This paper introduces FuzzingRL, a reinforcement learning-based fuzz testing method that generates challenging questions to expose vulnerabilities in vision-language models, significantly reducing their accuracy and transferring across models.

Contribution

It presents a novel reinforcement fuzzing approach that automatically creates adversarial queries to reveal VLM failures, with transferable policies across different models.

Findings

01

Decreased VLM accuracy from 86.58% to 65.53% after four RL iterations

02

Fuzzing policy trained on one VLM transfers effectively to others

03

Generated queries significantly degrade multiple VLMs' performance

Abstract

Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is critical for ensuring the reliability and safety of AI systems. In this paper, we propose an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities. The core of this approach lies in fuzz testing and reinforcement finetuning: we transform a single input query into a large set of diverse variants through vision and language fuzzing. Based on the fuzzing outcomes, the question generator is further instructed by adversarial reinforcement fine-tuning to produce increasingly challenging queries that trigger model failures. With this approach, we can consistently drive down a target VLM's answer accuracy -- for example, the accuracy of Qwen2.5-VL-32B on our generated questions drops from 86.58\% to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Topic Modeling