ReLearn: Unlearning via Learning for Large Language Models

Haoming Xu; Ningyuan Zhao; Liming Yang; Sendong Zhao; Shumin Deng; Mengru Wang; Bryan Hooi; Nay Oo; Huajun Chen; Ningyu Zhang

arXiv:2502.11190·cs.CL·May 29, 2025

ReLearn: Unlearning via Learning for Large Language Models

Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang

PDF

Open Access 1 Repo

TL;DR

ReLearn is a novel unlearning method for large language models that uses data augmentation and fine-tuning to effectively forget specific knowledge without degrading language generation quality.

Contribution

The paper introduces ReLearn, a new unlearning framework with a comprehensive evaluation system, addressing limitations of reverse optimization methods in large language models.

Findings

01

ReLearn achieves targeted knowledge forgetting while maintaining output quality.

02

The evaluation metrics KFR, KRR, and LS effectively measure forgetting and retention.

03

ReLearn preserves linguistic coherence better than reverse optimization methods.

Abstract

Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjunlp/unlearn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques