Mitigating Backdoor Attacks using Activation-Guided Model Editing
Felix Hsieh, Huy H. Nguyen, AprilPyone MaungMaung, Dmitrii Usynin,, Isao Echizen

TL;DR
This paper introduces a computationally efficient machine unlearning method that uses activation-guided model editing to effectively mitigate backdoor attacks in machine learning models, requiring only a few unseen samples.
Contribution
It presents a novel, efficient backdoor mitigation technique leveraging activation-guided model editing and introduces a repair step to maintain model utility.
Findings
Effective backdoor unlearning across multiple datasets
Requires only a few unseen samples for unlearning
Outperforms previous mitigation methods in accuracy and efficiency
Abstract
Backdoor attacks compromise the integrity and reliability of machine learning models by embedding a hidden trigger during the training process, which can later be activated to cause unintended misbehavior. We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks. The proposed method utilizes model activation of domain-equivalent unseen data to guide the editing of the model's weights. Unlike the previous unlearning-based mitigation methods, ours is computationally inexpensive and achieves state-of-the-art performance while only requiring a handful of unseen samples for unlearning. In addition, we also point out that unlearning the backdoor may cause the whole targeted class to be unlearned, thus introducing an additional repair step to preserve the model's utility after editing the model. Experiment results show that the proposed method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Simulation Techniques and Applications · Real-time simulation and control systems
