Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models

Arghyadeep Das; Sai Sreenivas Chintha; Rishiraj Girmal; Kinjal Pandey; Sharvi Endait

arXiv:2601.05076·cs.AI·January 9, 2026

Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models

Arghyadeep Das, Sai Sreenivas Chintha, Rishiraj Girmal, Kinjal Pandey, Sharvi Endait

PDF

Open Access

TL;DR

This paper addresses privacy risks in large reasoning models by developing methods to prevent PII leakage during chain-of-thought reasoning, introducing a new dataset and evaluation benchmark for privacy-aware reasoning.

Contribution

The authors propose a privacy-first reasoning approach using deployable interventions, introduce PII-CoT-Bench dataset, and evaluate strategies to reduce PII leakage with minimal utility loss.

Findings

01

Prompt-based controls are most effective for strong models.

02

Fine-tuning is necessary for weaker models to reduce leakage.

03

Both methods significantly lower PII exposure with little impact on utility.

Abstract

Large Reasoning Models (LRMs) improve performance, reliability, and interpretability by generating explicit chain-of-thought (CoT) reasoning, but this transparency introduces a serious privacy risk: intermediate reasoning often leaks personally identifiable information (PII) even when final answers are sanitized. We study how to induce privacy-first reasoning, where models reason without exposing sensitive information, using deployable interventions rather than post-hoc redaction. We introduce PII-CoT-Bench, a supervised dataset with privacy-aware CoT annotations, and a category-balanced evaluation benchmark covering realistic and adversarial leakage scenarios. Our results reveal a capability-dependent trend: state-of-the-art models benefit most from prompt-based controls, whereas weaker models require fine-tuning to achieve meaningful leakage reduction. Across models and categories,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks