Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning
Nay Myat Min, Long H. Pham, Jun Sun

TL;DR
This paper introduces ULRL, a two-phase method that effectively removes backdoors from neural networks using minimal clean data by unlearning suspicious neurons and then relearning them, ensuring security without sacrificing accuracy.
Contribution
ULRL is the first approach to combine unlearning and relearning for backdoor removal with only a few clean samples, improving robustness and efficiency.
Findings
Significantly reduces attack success rate across multiple datasets and architectures.
Maintains high clean accuracy even with only 1% clean data used for defense.
Effective against 12 different backdoor types.
Abstract
Deep neural networks have achieved remarkable success across various applications; however, their vulnerability to backdoor attacks poses severe security risks -- especially in situations where only a limited set of clean samples is available for defense. In this work, we address this critical challenge by proposing ULRL (UnLearn and ReLearn for backdoor removal), a novel two-phase approach for comprehensive backdoor removal. Our method first employs an unlearning phase, in which the network's loss is intentionally maximized on a small clean dataset to expose neurons that are excessively sensitive to backdoor triggers. Subsequently, in the relearning phase, these suspicious neurons are recalibrated using targeted reinitialization and cosine similarity regularization, effectively neutralizing backdoor influences while preserving the model's performance on benign data. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSparse Evolutionary Training
