Split, Unlearn, Merge: Leveraging Data Attributes for More Effective   Unlearning in LLMs

Swanand Ravindra Kadhe; Farhan Ahmed; Dennis Wei; Nathalie Baracaldo,; Inkit Padhi

arXiv:2406.11780·cs.LG·June 18, 2024

Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs

Swanand Ravindra Kadhe, Farhan Ahmed, Dennis Wei, Nathalie Baracaldo,, Inkit Padhi

PDF

Open Access

TL;DR

This paper introduces SPUNGE, a framework that enhances unlearning in large language models by leveraging data attributes to improve safety and effectiveness without sacrificing general capabilities.

Contribution

The paper proposes SPUNGE, a novel framework that amplifies unlearning effectiveness in LLMs by splitting, unlearning, and merging data based on attributes, applicable to any unlearning method.

Findings

01

SPUNGE significantly improves unlearning performance.

02

SPUNGE maintains model capabilities on benchmarks.

03

Applicable to various unlearning methods.

Abstract

Large language models (LLMs) have shown to pose social and ethical risks such as generating toxic language or facilitating malicious use of hazardous knowledge. Machine unlearning is a promising approach to improve LLM safety by directly removing harmful behaviors and knowledge. In this paper, we propose "SPlit, UNlearn, MerGE" (SPUNGE), a framework that can be used with any unlearning method to amplify its effectiveness. SPUNGE leverages data attributes during unlearning by splitting unlearning data into subsets based on specific attribute values, unlearning each subset separately, and merging the unlearned models. We empirically demonstrate that SPUNGE significantly improves the performance of two recent unlearning methods on state-of-the-art LLMs while maintaining their general capabilities on standard academic benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDiverse Research and Applications