Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen

TL;DR
This paper introduces AMDET, a model-level detection framework for identifying backdoors in vision-language pretrained models without prior knowledge, using feature assimilation properties and gradient-based inversion techniques.
Contribution
AMDET is the first to detect backdoors in VLPs at the model level without relying on prior training data or trigger information, utilizing feature assimilation and gradient inversion.
Findings
Achieves 89.90% F1 score in backdoor detection
Detects backdoors within approximately 5 minutes on high-end GPU
Demonstrates robustness against adaptive attack strategies
Abstract
Vision-language pretrained models (VLPs) such as CLIP have achieved remarkable success, but are also highly vulnerable to backdoor attacks. Given a model fine-tuned by an untrusted third party, determining whether the model has been injected with a backdoor is a critical and challenging problem. Existing detection methods usually rely on prior knowledge of training dataset, backdoor triggers and targets, or downstream classifiers, which may be impractical for real-world applications. To address this, To address this challenge, we introduce Assimilation Matters in DETection (AMDET), a novel model-level detection framework that operates without any such prior knowledge. Specifically, we first reveal the feature assimilation property in backdoored text encoders: the representations of all tokens within a backdoor sample exhibit a high similarity. Further analysis attributes this effect to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Topic Modeling
