Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models
Zongmin Zhang, Zhen Sun, Yifan Liao, Wenhan Dong, Xinlei He, Xingshuo Han, Shengmin Xu, Xinyi Huang

TL;DR
This paper identifies vulnerabilities in prompt-driven video segmentation models to backdoor attacks and introduces BadVSFM, a tailored attack framework that effectively manipulates model outputs with minimal impact on normal performance.
Contribution
The paper presents the first backdoor attack framework specifically designed for prompt-driven VSFMs, demonstrating its effectiveness and resilience against existing defenses.
Findings
Traditional backdoor attacks are largely ineffective on VSFMs.
BadVSFM achieves high success rates across multiple models and datasets.
Existing defenses are largely ineffective against BadVSFM.
Abstract
Prompt-driven Video Segmentation Foundation Models (VSFMs), such as SAM2, are increasingly used in applications including autonomous driving and digital pathology, yet their security risks remain underexplored. We study backdoor attacks against VSFMs and show that directly applying classic attacks such as BadNet is largely ineffective, yielding attack success rates (ASR) below 5%. Through gradient-similarity and attention-map analyses, we find that traditional backdoor training fails because clean and triggered samples induce aligned image-encoder gradients, while model attention remains focused on the prompt-specified object rather than the trigger. To address this limitation, we propose BadVSFM, the first backdoor attack framework tailored to prompt-driven VSFMs. BadVSFM uses a two-stage strategy that first learns trigger-specific encoder features and then trains the decoder to map…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
