Controllable Reasoning Models Are Private Thinkers

Haritz Puerto; Haonan Li; Xudong Han; Timothy Baldwin; Iryna Gurevych

arXiv:2602.24210·cs.CL·March 2, 2026

Controllable Reasoning Models Are Private Thinkers

Haritz Puerto, Haonan Li, Xudong Han, Timothy Baldwin, Iryna Gurevych

PDF

Open Access 10 Models 5 Datasets

TL;DR

This paper introduces a method to improve privacy in reasoning models by training them to follow instructions in reasoning traces, leading to significant privacy gains but with some utility trade-offs.

Contribution

The paper proposes a novel training approach that enhances privacy preservation in reasoning models by controlling their reasoning traces through instruction-following.

Findings

01

Up to 20.9 point improvement in instruction-following performance.

02

Up to 51.9 percentage points improvement on privacy benchmarks.

03

Decoupling reasoning and answer generation enhances privacy.

Abstract

AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We hypothesize that improving their instruction following abilities in the reasoning traces can improve their privacy-preservation skills. To demonstrate this, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data