Universal Backdoor Attacks

Benjamin Schneider; Nils Lukas; Florian Kerschbaum

arXiv:2312.00157·cs.LG·January 23, 2024·1 cites

Universal Backdoor Attacks

Benjamin Schneider, Nils Lukas, Florian Kerschbaum

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces a novel, efficient universal backdoor attack method that can control deep image classifiers across thousands of classes with minimal poisoning, exploiting inter-class transferability of triggers.

Contribution

The authors propose a new universal data poisoning technique that uses salient triggers and leverages inter-class transferability to control classifiers with few poisoned samples.

Findings

01

Effective control of models with up to 6,000 classes

02

Poisoning only 0.15% of training data

03

Triggers exploit inter-class transferability

Abstract

Web-scraped datasets are vulnerable to data poisoning, which can be used for backdooring deep image classifiers during training. Since training on large datasets is expensive, a model is trained once and re-used many times. Unlike adversarial examples, backdoor attacks often target specific classes rather than any class learned by the model. One might expect that targeting many classes through a naive composition of attacks vastly increases the number of poison samples. We show this is not necessarily true and more efficient, universal data poisoning attacks exist that allow controlling misclassifications from any source class into any target class with a small increase in poison samples. Our idea is to generate triggers with salient characteristics that the model can learn. The triggers we craft exploit a phenomenon we call inter-class poison transferability, where learning a trigger…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The writing was clear and easy to follow - Their bit-string encoding approach is a novel and elegant way to share feature information between classes while generating a class-specific backdoor trigger. - The experiments section was well-motivated and well-explained.

Weaknesses

- In general, each experiment should be averaged over multiple seeds for statistical significance - A major part of the backdoor attack regime is the preservation of clean accuracy, and there is no analysis on how well the proposed method protects a model's clean accuracy. This should certainly be included in future versions of the paper. - The proposed triggers in Fig. 2 seem quite obvious to the human eye and may be susceptible to input-space defenses. I would like to see some analysis on the

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1.The paper demonstrates clear logic. 2.The topic is intriguing and warrants further exploration.

Weaknesses

1.The design motivation of the algorithm is unclear. 2.The concealment of the patches is poor. 3.The comparative methods are outdated.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

$\bullet$ The authors proposed a method that was designed to poison any class, instead of targeting a single class. $\bullet$ The proposed attack is effective than the previous method, especially when the poisoning rate is low.

Weaknesses

$\bullet$ It is not clear why the proposed method improves the inter-class poison transferability and, in particular, how it ensures that an increase in attack success against one class improves attack success against other classes. Does the proposed method increase the transferability (attack success rate) of any two classes, even if these two classes differ significantly in the latent space? $\bullet$ The formula in Section 3.2 needs to be formulated more appropriately and clearly. Specifical

Code & Models

Repositories

ben-schneider-code/universal-backdoor-attacks
pytorchOfficial

Videos

Universal Backdoor Attacks· slideslive

Taxonomy

TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Network Security and Intrusion Detection