Controllable Reasoning Models Are Private Thinkers
Haritz Puerto, Haonan Li, Xudong Han, Timothy Baldwin, Iryna Gurevych

TL;DR
This paper introduces a method to improve privacy in reasoning models by training them to follow instructions in reasoning traces, leading to significant privacy gains but with some utility trade-offs.
Contribution
The paper proposes a novel training approach that enhances privacy preservation in reasoning models by controlling their reasoning traces through instruction-following.
Findings
Up to 20.9 point improvement in instruction-following performance.
Up to 51.9 percentage points improvement on privacy benchmarks.
Decoupling reasoning and answer generation enhances privacy.
Abstract
AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose training models to follow instructions not only in the final answer, but also in reasoning traces, potentially under different constraints. We hypothesize that improving their instruction following abilities in the reasoning traces can improve their privacy-preservation skills. To demonstrate this, we fine-tune models on a new instruction-following dataset with explicit restrictions on reasoning traces. We further introduce a generation strategy that decouples reasoning and answer generation using separate LoRA adapters. We evaluate our approach on six models from two model families, ranging from 1.7B to 14B parameters, across two instruction-following…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗haritzpuerto/unsloth-Qwen3-1.7B-IF-RTmodel· 4 dl4 dl
- 🤗haritzpuerto/unsloth-Qwen3-1.7B-IF-FAmodel· 1 dl1 dl
- 🤗haritzpuerto/unsloth-Qwen3-4B-IF-RTmodel· 2 dl2 dl
- 🤗haritzpuerto/unsloth-Qwen3-4B-IF-FAmodel· 2 dl2 dl
- 🤗haritzpuerto/unsloth-Qwen3-8B-IF-RTmodel· 2 dl2 dl
- 🤗haritzpuerto/unsloth-Qwen3-8B-IF-FAmodel· 3 dl3 dl
- 🤗haritzpuerto/unsloth-Qwen3-14B-IF-RTmodel· 2 dl2 dl
- 🤗haritzpuerto/unsloth-Qwen3-14B-IF-FAmodel· 3 dl3 dl
- 🤗haritzpuerto/unsloth-Qwen3-14B-IF-Avgmodel· 2 dl2 dl
- 🤗haritzpuerto/unsloth-Phi-4-3.8B-IF-RTmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data
