Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning

Dayong Ye; Tianqing Zhu; Jiayang Li; Kun Gao; Bo Liu; Leo Yu Zhang; Wanlei Zhou; Yang Zhang

arXiv:2501.16663·cs.CR·July 17, 2025

Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning

Dayong Ye, Tianqing Zhu, Jiayang Li, Kun Gao, Bo Liu, Leo Yu Zhang, Wanlei Zhou, Yang Zhang

PDF

Open Access

TL;DR

This paper investigates how data duplication affects machine unlearning, revealing vulnerabilities where duplicates can persist post-unlearning, degrade model performance, and evade detection across various paradigms.

Contribution

It introduces novel adversarial duplication techniques and analyzes their impact on unlearning effectiveness and detection methods across multiple unlearning paradigms.

Findings

01

Retraining from scratch may fail to unlearn duplicated data effectively.

02

Duplicated data can cause significant model degradation.

03

Crafted duplicates can evade de-duplication detection.

Abstract

Duplication is a prevalent issue within datasets. Existing research has demonstrated that the presence of duplicated data in training datasets can significantly influence both model performance and data privacy. However, the impact of data duplication on the unlearning process remains largely unexplored. This paper addresses this gap by pioneering a comprehensive investigation into the role of data duplication, not only in standard machine unlearning but also in federated and reinforcement unlearning paradigms. Specifically, we propose an adversary who duplicates a subset of the target model's training set and incorporates it into the training set. After training, the adversary requests the model owner to unlearn this duplicated subset, and analyzes the impact on the unlearned model. For example, the adversary can challenge the model owner by revealing that, despite efforts to unlearn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInternet of Things and AI · Brain Tumor Detection and Classification · Network Security and Intrusion Detection

MethodsSparse Evolutionary Training