Simulated Ensemble Attack: Transferring Jailbreaks Across Fine-tuned Vision-Language Models
Ruofan Wang, Xin Wang, Yang Yao, Juncheng Li, Xuan Tong, Xingjun Ma

TL;DR
This paper introduces the Simulated Ensemble Attack (SEA), a grey-box method that exploits jailbreak vulnerabilities in base vision-language models to transfer attacks across various fine-tuned models, revealing security risks in current practices.
Contribution
The paper proposes SEA, a novel transfer attack framework that models parameter variations and uses textual guidance, significantly improving jailbreak transferability across fine-tuned VLMs.
Findings
SEA achieves high transfer success and toxicity rates.
Fine-tuning causes localized parameter shifts around the base model.
SEA generalizes across different base model generations.
Abstract
The widespread practice of fine-tuning open-source Vision-Language Models (VLMs) raises a critical security concern: jailbreak vulnerabilities in base models may persist in downstream variants, enabling transferable attacks across fine-tuned systems. To investigate this risk, we propose the Simulated Ensemble Attack (SEA), a grey-box jailbreak framework that assumes full access to the base VLM but no knowledge of the fine-tuned target. SEA enhances transferability via Fine-tuning Trajectory Simulation (FTS), which models bounded parameter variations in the vision encoder, and Targeted Prompt Guidance (TPG), which stabilizes adversarial optimization through auxiliary textual guidance. Experiments on the Qwen2-VL family demonstrate that SEA achieves consistently high transfer success and toxicity rates across diverse fine-tuned variants, including safety-enhanced models, while standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing
