Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Nay Myat Min; Long H. Pham; Jun Sun

arXiv:2405.14781·cs.CR·June 25, 2025

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Nay Myat Min, Long H. Pham, Jun Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces ULRL, a two-phase method that effectively removes backdoors from neural networks using minimal clean data by unlearning suspicious neurons and then relearning them, ensuring security without sacrificing accuracy.

Contribution

ULRL is the first approach to combine unlearning and relearning for backdoor removal with only a few clean samples, improving robustness and efficiency.

Findings

01

Significantly reduces attack success rate across multiple datasets and architectures.

02

Maintains high clean accuracy even with only 1% clean data used for defense.

03

Effective against 12 different backdoor types.

Abstract

Deep neural networks have achieved remarkable success across various applications; however, their vulnerability to backdoor attacks poses severe security risks -- especially in situations where only a limited set of clean samples is available for defense. In this work, we address this critical challenge by proposing ULRL (UnLearn and ReLearn for backdoor removal), a novel two-phase approach for comprehensive backdoor removal. Our method first employs an unlearning phase, in which the network's loss is intentionally maximized on a small clean dataset to expose neurons that are excessively sensitive to backdoor triggers. Subsequently, in the relearning phase, these suspicious neurons are recalibrated using targeted reinitialization and cosine similarity regularization, effectively neutralizing backdoor influences while preserving the model's performance on benign data. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naymyatmin/ulrl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training