Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs
Patrick Ahrend, Tobias Eder, Xiyang Yang, Zhiyi Pan, Georg Groh

TL;DR
This paper investigates how Chain-of-Thought prompting in large language models can lead to privacy leaks of personally identifiable information, and evaluates methods to detect and mitigate such leakage effectively.
Contribution
It introduces a model-agnostic framework for measuring PII leakage during reasoning and benchmarks multiple gatekeeping strategies to balance privacy and utility.
Findings
CoT increases PII leakage, especially for high-risk categories.
Leakage varies significantly with model family and reasoning budget.
No single gatekeeper method outperforms others across all scenarios.
Abstract
Chain-of-Thought (CoT) prompting improves LLM reasoning but can increase privacy risk by resurfacing personally identifiable information (PII) from the prompt into reasoning traces and outputs, even under policies that instruct the model not to restate PII. We study such direct, inference-time PII leakage using a model-agnostic framework that (i) defines leakage as risk-weighted, token-level events across 11 PII types, (ii) traces leakage curves as a function of the allowed CoT budget, and (iii) compares open- and closed-source model families on a structured PII dataset with a hierarchical risk taxonomy. We find that CoT consistently elevates leakage, especially for high-risk categories, and that leakage is strongly family- and budget-dependent. Increasing the reasoning budget can either amplify or attenuate leakage depending on the base model. We then benchmark lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Security and Verification in Computing · Advanced Graph Neural Networks
