Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

Changsheng Wang; Chongyu Fan; Yihua Zhang; Jinghan Jia; Dennis Wei; Parikshit Ram; Nathalie Baracaldo; Sijia Liu

arXiv:2506.12963·cs.AI·October 14, 2025

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces R2MU, a novel unlearning method for large reasoning models that effectively erases sensitive reasoning traces and answers while maintaining reasoning capabilities, addressing safety concerns in chain-of-thought models.

Contribution

The paper presents R2MU, a new approach for unlearning in LRMs that targets reasoning traces, not just final answers, improving safety without sacrificing reasoning skills.

Findings

01

R2MU significantly reduces sensitive information leakage in reasoning traces.

02

R2MU maintains reasoning performance while erasing sensitive data.

03

Experiments on state-of-the-art models validate the effectiveness of R2MU.

Abstract

Recent advances in large reasoning models (LRMs) have enabled strong chain-of-thought (CoT) generation through test-time computation. While these multi-step reasoning capabilities represent a major milestone in language model performance, they also introduce new safety risks. In this work, we present the first systematic study to revisit the problem of machine unlearning in the context of LRMs. Machine unlearning refers to the process of removing the influence of sensitive, harmful, or undesired data or knowledge from a trained model without full retraining. We show that conventional unlearning algorithms, originally designed for non-reasoning models, are inadequate for LRMs. In particular, even when final answers are successfully erased, sensitive information often persists within the intermediate reasoning steps, i.e., CoT trajectories. To address this challenge, we extend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning