Adversarial Unlearning of Backdoors via Implicit Hypergradient

Yi Zeng; Si Chen; Won Park; Z. Morley Mao; Ming Jin; Ruoxi Jia

arXiv:2110.03735·cs.LG·February 8, 2022·37 cites

Adversarial Unlearning of Backdoors via Implicit Hypergradient

Yi Zeng, Si Chen, Won Park, Z. Morley Mao, Ming Jin, Ruoxi Jia

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces I-BAU, a novel implicit hypergradient-based algorithm for effectively removing backdoors from poisoned models, demonstrating superior robustness and efficiency over existing defenses across diverse attack scenarios.

Contribution

The paper presents the I-BAU algorithm utilizing implicit hypergradients for backdoor unlearning, with theoretical convergence analysis and extensive empirical validation showing improved performance and speed.

Findings

01

I-BAU outperforms six state-of-the-art defenses across multiple datasets and attack settings.

02

I-BAU is significantly faster, over 13 times quicker than the most efficient baseline.

03

I-BAU remains effective even with only 100 clean samples available.

Abstract

We propose a minimax formulation for removing backdoors from a given poisoned model based on a small set of clean data. This formulation encompasses much of prior work on backdoor removal. We propose the Implicit Bacdoor Adversarial Unlearning (I-BAU) algorithm to solve the minimax. Unlike previous work, which breaks down the minimax into separate inner and outer problems, our algorithm utilizes the implicit hypergradient to account for the interdependence between inner and outer optimization. We theoretically analyze its convergence and the generalizability of the robustness gained by solving minimax on clean data to unseen test data. In our evaluation, we compare I-BAU with six state-of-art backdoor defenses on seven backdoor attacks over two datasets and various attack settings, including the common setting where the attacker targets one class as well as important but underexplored…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Adversarial Unlearning of Backdoors via Implicit Hypergradient· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques

MethodsTest