R-TOFU: Unlearning in Large Reasoning Models

Sangyeon Yoon; Wonje Jeung; Albert No

arXiv:2505.15214·cs.CL·May 28, 2025

R-TOFU: Unlearning in Large Reasoning Models

Sangyeon Yoon, Wonje Jeung, Albert No

PDF

Open Access 1 Video

TL;DR

This paper introduces R-TOFU, a benchmark for evaluating unlearning in large reasoning models, highlighting challenges in removing private information from multi-step reasoning traces and proposing new methods to improve unlearning effectiveness.

Contribution

The paper presents R-TOFU, the first benchmark for unlearning in reasoning models, and proposes Reasoned IDK, a preference-optimization method that better balances forgetting and utility.

Findings

01

Answer-only objectives leave residual knowledge in reasoning traces.

02

Decoding variants can still reveal forgotten content despite unlearning.

03

R-TOFU provides a systematic foundation for studying unlearning in LRMs.

Abstract

Large Reasoning Models (LRMs) embed private or copyrighted information not only in their final answers but also throughout multi-step chain-of-thought (CoT) traces, making reliable unlearning far more demanding than in standard LLMs. We introduce Reasoning-TOFU (R-TOFU), the first benchmark tailored to this setting. R-TOFU augments existing unlearning tasks with realistic CoT annotations and provides step-wise metrics that expose residual knowledge invisible to answer-level checks. Using R-TOFU, we carry out a comprehensive comparison of gradient-based and preference-optimization baselines and show that conventional answer-only objectives leave substantial forget traces in reasoning. We further propose Reasoned IDK, a preference-optimization variant that preserves coherent yet inconclusive reasoning, achieving a stronger balance between forgetting efficacy and model utility than earlier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

R-TOFU: Unlearning in Large Reasoning Models· underline

Taxonomy

TopicsRough Sets and Fuzzy Logic · Machine Learning and Data Classification · AI-based Problem Solving and Planning