Phantasia: Context-Adaptive Backdoors in Vision Language Models
Nam Duong Tran, Phi Le Nguyen

TL;DR
This paper reveals the vulnerability of vision-language models to easily detectable backdoors and introduces Phantasia, a novel context-adaptive attack that enhances stealth and effectiveness against defenses.
Contribution
It demonstrates the overestimated stealth of existing backdoor attacks and proposes Phantasia, the first context-adaptive backdoor method for VLMs that aligns poisoned outputs with input semantics.
Findings
Existing backdoor attacks are more detectable than previously thought.
Phantasia achieves high attack success rates while remaining stealthy.
Phantasia outperforms prior methods under various defenses.
Abstract
Recent advances in Vision-Language Models (VLMs) have greatly enhanced the integration of visual perception and linguistic reasoning, driving rapid progress in multimodal understanding. Despite these achievements, the security of VLMs, particularly their vulnerability to backdoor attacks, remains significantly underexplored. Existing backdoor attacks on VLMs are still in an early stage of development, with most current methods relying on generating poisoned responses that contain fixed, easily identifiable patterns. In this work, we make two key contributions. First, we demonstrate for the first time that the stealthiness of existing VLM backdoor attacks has been substantially overestimated. By adapting defense techniques originally designed for other domains (e.g., vision-only and text-only models), we show that several state-of-the-art attacks can be detected with surprising ease.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
