Universal Backdoor Attacks
Benjamin Schneider, Nils Lukas, Florian Kerschbaum

TL;DR
This paper introduces a novel, efficient universal backdoor attack method that can control deep image classifiers across thousands of classes with minimal poisoning, exploiting inter-class transferability of triggers.
Contribution
The authors propose a new universal data poisoning technique that uses salient triggers and leverages inter-class transferability to control classifiers with few poisoned samples.
Findings
Effective control of models with up to 6,000 classes
Poisoning only 0.15% of training data
Triggers exploit inter-class transferability
Abstract
Web-scraped datasets are vulnerable to data poisoning, which can be used for backdooring deep image classifiers during training. Since training on large datasets is expensive, a model is trained once and re-used many times. Unlike adversarial examples, backdoor attacks often target specific classes rather than any class learned by the model. One might expect that targeting many classes through a naive composition of attacks vastly increases the number of poison samples. We show this is not necessarily true and more efficient, universal data poisoning attacks exist that allow controlling misclassifications from any source class into any target class with a small increase in poison samples. Our idea is to generate triggers with salient characteristics that the model can learn. The triggers we craft exploit a phenomenon we call inter-class poison transferability, where learning a trigger…
Peer Reviews
Decision·ICLR 2024 poster
- The writing was clear and easy to follow - Their bit-string encoding approach is a novel and elegant way to share feature information between classes while generating a class-specific backdoor trigger. - The experiments section was well-motivated and well-explained.
- In general, each experiment should be averaged over multiple seeds for statistical significance - A major part of the backdoor attack regime is the preservation of clean accuracy, and there is no analysis on how well the proposed method protects a model's clean accuracy. This should certainly be included in future versions of the paper. - The proposed triggers in Fig. 2 seem quite obvious to the human eye and may be susceptible to input-space defenses. I would like to see some analysis on the
1.The paper demonstrates clear logic. 2.The topic is intriguing and warrants further exploration.
1.The design motivation of the algorithm is unclear. 2.The concealment of the patches is poor. 3.The comparative methods are outdated.
$\bullet$ The authors proposed a method that was designed to poison any class, instead of targeting a single class. $\bullet$ The proposed attack is effective than the previous method, especially when the poisoning rate is low.
$\bullet$ It is not clear why the proposed method improves the inter-class poison transferability and, in particular, how it ensures that an increase in attack success against one class improves attack success against other classes. Does the proposed method increase the transferability (attack success rate) of any two classes, even if these two classes differ significantly in the latent space? $\bullet$ The formula in Section 3.2 needs to be formulated more appropriately and clearly. Specifical
Code & Models
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Network Security and Intrusion Detection
