How to Protect Models against Adversarial Unlearning?

Patryk Jasiorski; Marek Klonowski; Micha{\l} Wo\'zniak

arXiv:2507.10886·cs.LG·July 16, 2025

How to Protect Models against Adversarial Unlearning?

Patryk Jasiorski, Marek Klonowski, Micha{\l} Wo\'zniak

PDF

Open Access

TL;DR

This paper explores the challenge of adversarial unlearning in AI models, where malicious requests aim to degrade performance, and proposes a novel method to safeguard models against such attacks and unintended unlearning effects.

Contribution

It introduces a new approach to protect models from performance deterioration caused by both spontaneous and malicious unlearning requests.

Findings

01

Adversarial unlearning depends on model and data selection strategies.

02

The proposed method effectively mitigates performance loss from unlearning attacks.

03

Protection mechanism maintains model accuracy under adversarial conditions.

Abstract

AI models need to be unlearned to fulfill the requirements of legal acts such as the AI Act or GDPR, and also because of the need to remove toxic content, debiasing, the impact of malicious instances, or changes in the data distribution structure in which a model works. Unfortunately, removing knowledge may cause undesirable side effects, such as a deterioration in model performance. In this paper, we investigate the problem of adversarial unlearning, where a malicious party intentionally sends unlearn requests to deteriorate the model's performance maximally. We show that this phenomenon and the adversary's capabilities depend on many factors, primarily on the backbone model itself and strategy/limitations in selecting data to be unlearned. The main result of this work is a new method of protecting model performance from these side effects, both in the case of unlearned behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning