Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing
Dongliang Guo, Mengxuan Hu, Zihan Guan, Junfeng Guo, Thomas, Hartvigsen, Sheng Li

TL;DR
This paper introduces EDT, a novel, efficient, data-free, training-free backdoor attack method for large pre-trained models, enabling quick and resource-light manipulation without access to training data or retraining.
Contribution
The paper presents EDT, a new model editing-based backdoor attack technique that overcomes data access and computational barriers in attacking large pre-trained models.
Findings
EDT effectively injects backdoors into models like ViT, CLIP, BLIP, and stable diffusion.
The method works across various downstream tasks such as classification, captioning, and generation.
EDT does not require training or dataset poisoning, reducing attack complexity.
Abstract
Large pre-trained models have achieved notable success across a range of downstream tasks. However, recent research shows that a type of adversarial attack ( backdoor attack) can manipulate the behavior of machine learning models through contaminating their training dataset, posing significant threat in the real-world application of large pre-trained model, especially for those customized models. Therefore, addressing the unique challenges for exploring vulnerability of pre-trained models is of paramount importance. Through empirical studies on the capability for performing backdoor attack in large pre-trained models ( ViT), we find the following unique challenges of attacking large pre-trained models: 1) the inability to manipulate or even access large training datasets, and 2) the substantial computational resources required for training or fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsBLIP: Bootstrapping Language-Image Pre-training · Contrastive Language-Image Pre-training
