RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality
Chenlong Zhang, Zhuoran Jin, Hongbang Yuan, Jiaheng Wei, Tong Zhou, Kang Liu, Jun Zhao, Yubo Chen

TL;DR
RULE introduces a reinforcement learning framework for targeted unlearning in large language models, effectively removing specific information while maintaining overall utility and improving response naturalness.
Contribution
The paper presents RULE, a novel reinforcement learning-based unlearning method that achieves forget-retain Pareto optimality with minimal data and synthesizes boundary queries for effective model unlearning.
Findings
Outperforms existing methods in forget quality and response naturalness.
Achieves unlearning with only 12% forget set and 8% boundary data.
Enhances model generalization and response naturalness.
Abstract
The widespread deployment of Large Language Models (LLMs) trained on massive, uncurated corpora has raised growing concerns about the inclusion of sensitive, copyrighted, or illegal content. This has led to increasing interest in LLM unlearning: the task of selectively removing specific information from a model without retraining from scratch or degrading overall utility. However, existing methods often rely on large-scale forget and retain datasets, and suffer from unnatural responses, poor generalization, or catastrophic utility loss. In this work, we propose Reinforcement UnLearning (RULE), an efficient framework that formulates unlearning as a refusal boundary optimization problem. RULE is trained with a small portion of the forget set and synthesized boundary queries, using a verifiable reward function that encourages safe refusal on forget--related queries while preserving helpful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Advanced Graph Neural Networks
