Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained   Models via Model Editing

Dongliang Guo; Mengxuan Hu; Zihan Guan; Junfeng Guo; Thomas; Hartvigsen; Sheng Li

arXiv:2410.18267·cs.AI·October 29, 2024

Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing

Dongliang Guo, Mengxuan Hu, Zihan Guan, Junfeng Guo, Thomas, Hartvigsen, Sheng Li

PDF

Open Access

TL;DR

This paper introduces EDT, a novel, efficient, data-free, training-free backdoor attack method for large pre-trained models, enabling quick and resource-light manipulation without access to training data or retraining.

Contribution

The paper presents EDT, a new model editing-based backdoor attack technique that overcomes data access and computational barriers in attacking large pre-trained models.

Findings

01

EDT effectively injects backdoors into models like ViT, CLIP, BLIP, and stable diffusion.

02

The method works across various downstream tasks such as classification, captioning, and generation.

03

EDT does not require training or dataset poisoning, reducing attack complexity.

Abstract

Large pre-trained models have achieved notable success across a range of downstream tasks. However, recent research shows that a type of adversarial attack ( $i.e.,$ backdoor attack) can manipulate the behavior of machine learning models through contaminating their training dataset, posing significant threat in the real-world application of large pre-trained model, especially for those customized models. Therefore, addressing the unique challenges for exploring vulnerability of pre-trained models is of paramount importance. Through empirical studies on the capability for performing backdoor attack in large pre-trained models ( $e.g.,$ ViT), we find the following unique challenges of attacking large pre-trained models: 1) the inability to manipulate or even access large training datasets, and 2) the substantial computational resources required for training or fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsBLIP: Bootstrapping Language-Image Pre-training · Contrastive Language-Image Pre-training