Potion: Towards Poison Unlearning

Stefan Schoepf; Jack Foster; Alexandra Brintrup

arXiv:2406.09173·cs.LG·September 12, 2024·1 cites

Potion: Towards Poison Unlearning

Stefan Schoepf, Jack Foster, Alexandra Brintrup

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel poison unlearning method that effectively removes poisoned data from trained models, outperforming existing techniques in accuracy and efficiency, especially when the poisoned data subset is unknown or contaminated.

Contribution

We propose a new outlier-resistant unlearning approach and a hyperparameter search method, Poison Trigger Neutralisation (PTN), to improve poison unlearning when the poisoned subset is partially unknown.

Findings

01

Our method heals 93.72% of poison compared to SSD's 83.41%.

02

Model accuracy drop is reduced from 5.68% to 1.41%.

03

Outperforms full retraining in effectiveness and efficiency.

Abstract

Adversarial attacks by malicious actors on machine learning systems, such as introducing poison triggers into training datasets, pose significant risks. The challenge in resolving such an attack arises in practice when only a subset of the poisoned data can be identified. This necessitates the development of methods to remove, i.e. unlearn, poison triggers from already trained models with only a subset of the poison data available. The requirements for this task significantly deviate from privacy-focused unlearning where all of the data to be forgotten by the model is known. Previous work has shown that the undiscovered poisoned samples lead to a failure of established unlearning methods, with only one method, Selective Synaptic Dampening (SSD), showing limited success. Even full retraining, after the removal of the identified poison, cannot address this challenge as the undiscovered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

if-loops/towards_poison_unlearning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPoisoning and overdose treatments

MethodsSparse Evolutionary Training