Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma

TL;DR
This paper introduces Anti-Backdoor Learning (ABL), a training scheme designed to prevent backdoor triggers from being embedded into neural networks trained on poisoned data, by exploiting inherent attack weaknesses.
Contribution
The paper proposes a novel two-stage gradient ascent method for training models that inherently resist backdoor injection during learning.
Findings
ABL achieves comparable performance to clean data training on poisoned datasets.
The method effectively isolates backdoor examples early in training.
ABL outperforms existing defenses against multiple backdoor attack methods.
Abstract
Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the \emph{clean} and the \emph{backdoor} portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
