Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Kealan Dunnett; Reza Arablouei; Dimity Miller; Volkan Dedeoglu and; Raja Jurdak

arXiv:2405.03918·cs.LG·May 8, 2024

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu and, Raja Jurdak

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gradient-based model pruning method to unlearn backdoor attacks in neural networks, especially effective with limited data, by removing malicious model components through targeted unlearning loss gradients.

Contribution

It presents a novel unlearning-based approach using model pruning and unlearning loss gradients to mitigate backdoor attacks, with theoretical backing and practical effectiveness.

Findings

01

Effective backdoor mitigation with limited data

02

Outperforms state-of-the-art methods in realistic scenarios

03

Simple and theoretically grounded approach

Abstract

In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial amounts of data for effective mitigation, posing significant challenges in practical deployment. To address this, we propose a novel approach to counter backdoor attacks by treating their mitigation as an unlearning task. We tackle this challenge through a targeted model pruning strategy, leveraging unlearning loss gradients to identify and eliminate backdoor elements within the model. Built on solid theoretical insights, our approach offers simplicity and effectiveness, rendering it well-suited for scenarios with limited data availability. Our methodology includes formulating a suitable unlearning loss and devising a model-pruning technique tailored for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

whodunnett/grad-prune
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection

MethodsPruning