Selective Forgetting for Large Reasoning Models
Tuan Le, Wei Qian, and Mengdi Huai

TL;DR
This paper introduces a novel selective unlearning framework for large reasoning models that removes sensitive reasoning components while preserving overall reasoning abilities, addressing ethical and legal concerns.
Contribution
It proposes a retrieval-augmented generation-based analysis and feature replacement unlearning loss to precisely forget targeted knowledge without degrading reasoning capabilities.
Findings
Effective removal of sensitive reasoning segments demonstrated on synthetic datasets.
Maintains reasoning performance while suppressing forgotten content in medical datasets.
Outperforms existing unlearning methods in preserving model integrity.
Abstract
Large Reasoning Models (LRMs) generate structured chains of thought (CoTs) before producing final answers, making them especially vulnerable to knowledge leakage through intermediate reasoning steps. Yet, the memorization of sensitive information in the training data such as copyrighted and private content has led to ethical and legal concerns. To address these issues, selective forgetting (also known as machine unlearning) has emerged as a potential remedy for LRMs. However, existing unlearning methods primarily target final answers and may degrade the overall reasoning ability of LRMs after forgetting. Additionally, directly applying unlearning on the entire CoTs could degrade the general reasoning capabilities. The key challenge for LRM unlearning lies in achieving precise unlearning of targeted knowledge while preserving the integrity of general reasoning capabilities. To bridge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
