CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization

Junyi Li; Yongqiang Chen; Ningning Ding

arXiv:2604.15847·cs.CL·April 20, 2026

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization

Junyi Li, Yongqiang Chen, Ningning Ding

PDF

TL;DR

CiPO introduces an iterative preference optimization framework to effectively unlearn specific knowledge from large reasoning models without impairing their reasoning capabilities.

Contribution

The paper proposes a novel counterfactual unlearning method that targets reasoning traces, enabling complete knowledge removal while maintaining reasoning performance.

Findings

01

CiPO successfully removes undesired knowledge from reasoning traces.

02

The method preserves the reasoning abilities of large models after unlearning.

03

Experiments show superior unlearning effectiveness on challenging benchmarks.

Abstract

Machine unlearning has gained increasing attention in recent years, as a promising technique to selectively remove unwanted privacy or copyrighted information from Large Language Models that are trained on a massive scale of human data. However, the emergence of Large Reasoning Models (LRMs), which emphasize long chain-of-thought (CoT) reasoning to address complex questions, presents a dilemma to unlearning: existing methods either struggle to completely eliminate undesired knowledge from the CoT traces or degrade the reasoning performances due to the interference with the reasoning process. To this end, we introduce Counterfactual Unlearning through iterative Preference Optimization (CiPO), a novel framework that redefines unlearning as the targeted intervention of the CoT reasoning in LRMs. More specifically, given a desired unlearning target answer, CiPO instructs LRMs to generate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.