Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning

Hwan Chang; Hwanhee Lee

arXiv:2502.11441·cs.CL·May 29, 2025

Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning

Hwan Chang, Hwanhee Lee

PDF

Open Access 1 Video

TL;DR

This paper investigates how different subsets of training data, especially syntactically similar queries, affect the effectiveness of unlearning in large language models, revealing the importance of syntactic similarity in privacy-preserving model updates.

Contribution

It introduces the concept of the Syntactically Similar Neighbor Set and demonstrates its significance in LLM unlearning and performance preservation.

Findings

01

Syntactically similar queries suffer the greatest performance drop during unlearning.

02

Using this set for regularization improves performance across various data subsets.

03

Syntactic similarity is more critical than domain or entity relationships in unlearning effectiveness.

Abstract

Large language models (LLMs) risk retaining unauthorized or sensitive information from their training data, which raises privacy concerns. LLM unlearning seeks to mitigate these risks by selectively removing specified data while maintaining overall model performance. However, most existing work focus on methods to achieve effective forgetting and does not provide a detailed analysis of the retain set, the portion of training data that is not targeted for removal. In this paper, we investigate the effects of unlearning on various subsets of the retain set through a case study on entity unlearning. We introduce the Syntactically Similar Neighbor Set, a group of queries that share similar syntactic structures with the data targeted for removal, and show that this subset suffers the greatest performance drop during unlearning. Moreover, when used for regularization, this set not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning· underline

Taxonomy

TopicsFinancial Distress and Bankruptcy Prediction

MethodsFocus · Sparse Evolutionary Training