Efficient Backdoor Removal Through Natural Gradient Fine-tuning
Nazmul Karim, Abdullah Al Arafat, Umar Khalid, Zhishan Guo, Naznin, Rahnavard

TL;DR
This paper introduces Natural Gradient Fine-tuning (NGF), a novel method for backdoor removal in neural networks that fine-tunes only one layer using a geometry-aware optimizer and a regularizer, achieving state-of-the-art results.
Contribution
The paper proposes NGF, a backdoor purification technique that fine-tunes a single layer with a geometry-aware optimizer and a regularizer based on Fisher Information, reducing computational costs and improving performance.
Findings
NGF effectively removes backdoors across multiple datasets and attacks.
Achieves state-of-the-art backdoor defense performance.
Reduces computational costs by fine-tuning only one layer.
Abstract
The success of a deep neural network (DNN) heavily relies on the details of the training scheme; e.g., training data, architectures, hyper-parameters, etc. Recent backdoor attacks suggest that an adversary can take advantage of such training details and compromise the integrity of a DNN. Our studies show that a backdoor model is usually optimized to a bad local minima, i.e. sharper minima as compared to a benign model. Intuitively, a backdoor model can be purified by reoptimizing the model to a smoother minima through fine-tuning with a few clean validation data. However, fine-tuning all DNN parameters often requires huge computational costs and often results in sub-par clean test performance. To address this concern, we propose a novel backdoor purification technique, Natural Gradient Fine-tuning (NGF), which focuses on removing the backdoor by fine-tuning only one layer. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Anomaly Detection Techniques and Applications
