Adversarial Unlearning of Backdoors via Implicit Hypergradient
Yi Zeng, Si Chen, Won Park, Z. Morley Mao, Ming Jin, Ruoxi Jia

TL;DR
This paper introduces I-BAU, a novel implicit hypergradient-based algorithm for effectively removing backdoors from poisoned models, demonstrating superior robustness and efficiency over existing defenses across diverse attack scenarios.
Contribution
The paper presents the I-BAU algorithm utilizing implicit hypergradients for backdoor unlearning, with theoretical convergence analysis and extensive empirical validation showing improved performance and speed.
Findings
I-BAU outperforms six state-of-the-art defenses across multiple datasets and attack settings.
I-BAU is significantly faster, over 13 times quicker than the most efficient baseline.
I-BAU remains effective even with only 100 clean samples available.
Abstract
We propose a minimax formulation for removing backdoors from a given poisoned model based on a small set of clean data. This formulation encompasses much of prior work on backdoor removal. We propose the Implicit Bacdoor Adversarial Unlearning (I-BAU) algorithm to solve the minimax. Unlike previous work, which breaks down the minimax into separate inner and outer problems, our algorithm utilizes the implicit hypergradient to account for the interdependence between inner and outer optimization. We theoretically analyze its convergence and the generalizability of the robustness gained by solving minimax on clean data to unseen test data. In our evaluation, we compare I-BAU with six state-of-art backdoor defenses on seven backdoor attacks over two datasets and various attack settings, including the common setting where the attacker targets one class as well as important but underexplored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
MethodsTest
