Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models
Arghyadeep Das, Sai Sreenivas Chintha, Rishiraj Girmal, Kinjal Pandey, Sharvi Endait

TL;DR
This paper addresses privacy risks in large reasoning models by developing methods to prevent PII leakage during chain-of-thought reasoning, introducing a new dataset and evaluation benchmark for privacy-aware reasoning.
Contribution
The authors propose a privacy-first reasoning approach using deployable interventions, introduce PII-CoT-Bench dataset, and evaluate strategies to reduce PII leakage with minimal utility loss.
Findings
Prompt-based controls are most effective for strong models.
Fine-tuning is necessary for weaker models to reduce leakage.
Both methods significantly lower PII exposure with little impact on utility.
Abstract
Large Reasoning Models (LRMs) improve performance, reliability, and interpretability by generating explicit chain-of-thought (CoT) reasoning, but this transparency introduces a serious privacy risk: intermediate reasoning often leaks personally identifiable information (PII) even when final answers are sanitized. We study how to induce privacy-first reasoning, where models reason without exposing sensitive information, using deployable interventions rather than post-hoc redaction. We introduce PII-CoT-Bench, a supervised dataset with privacy-aware CoT annotations, and a category-balanced evaluation benchmark covering realistic and adversarial leakage scenarios. Our results reveal a capability-dependent trend: state-of-the-art models benefit most from prompt-based controls, whereas weaker models require fine-tuning to achieve meaningful leakage reduction. Across models and categories,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
