Simulated Ensemble Attack: Transferring Jailbreaks Across Fine-tuned Vision-Language Models

Ruofan Wang; Xin Wang; Yang Yao; Juncheng Li; Xuan Tong; Xingjun Ma

arXiv:2508.01741·cs.CV·January 15, 2026

Simulated Ensemble Attack: Transferring Jailbreaks Across Fine-tuned Vision-Language Models

Ruofan Wang, Xin Wang, Yang Yao, Juncheng Li, Xuan Tong, Xingjun Ma

PDF

Open Access

TL;DR

This paper introduces the Simulated Ensemble Attack (SEA), a grey-box method that exploits jailbreak vulnerabilities in base vision-language models to transfer attacks across various fine-tuned models, revealing security risks in current practices.

Contribution

The paper proposes SEA, a novel transfer attack framework that models parameter variations and uses textual guidance, significantly improving jailbreak transferability across fine-tuned VLMs.

Findings

01

SEA achieves high transfer success and toxicity rates.

02

Fine-tuning causes localized parameter shifts around the base model.

03

SEA generalizes across different base model generations.

Abstract

The widespread practice of fine-tuning open-source Vision-Language Models (VLMs) raises a critical security concern: jailbreak vulnerabilities in base models may persist in downstream variants, enabling transferable attacks across fine-tuned systems. To investigate this risk, we propose the Simulated Ensemble Attack (SEA), a grey-box jailbreak framework that assumes full access to the base VLM but no knowledge of the fine-tuned target. SEA enhances transferability via Fine-tuning Trajectory Simulation (FTS), which models bounded parameter variations in the vision encoder, and Targeted Prompt Guidance (TPG), which stabilizes adversarial optimization through auxiliary textual guidance. Experiments on the Qwen2-VL family demonstrate that SEA achieves consistently high transfer success and toxicity rates across diverse fine-tuned variants, including safety-enhanced models, while standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing