Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Anmol Goel; Cornelius Emde; Sangdoo Yun; Seong Joon Oh; Martin Gubri

arXiv:2601.15220·cs.CL·April 21, 2026

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models

Anmol Goel, Cornelius Emde, Sangdoo Yun, Seong Joon Oh, Martin Gubri

PDF

1 Repo

TL;DR

Benign fine-tuning of language models can cause privacy collapse, exposing sensitive information despite maintaining high performance on standard benchmarks.

Contribution

This paper uncovers a new privacy vulnerability in fine-tuned language models, demonstrating how subtle training patterns can lead to privacy violations without affecting utility.

Findings

01

Privacy collapse observed in six models and five datasets.

02

Fine-tuning degrades models' ability to handle privacy norms.

03

Privacy representations are more fragile than task-relevant features.

Abstract

We identify a novel phenomenon in language models: benign fine-tuning of frontier models can lead to privacy collapse. We find that diverse, subtle patterns in training data can degrade contextual privacy, including optimisation for helpfulness, exposure to user information, emotional and subjective dialogue, and debugging code printing internal variables, among others. Fine-tuned models lose their ability to reason about contextual privacy norms, share information inappropriately with tools, and violate memory boundaries across contexts. Privacy collapse is a ``silent failure'' because models maintain high performance on standard safety and utility benchmarks whilst exhibiting severe privacy vulnerabilities. Our experiments show evidence of privacy collapse across six models (closed and open weight), five fine-tuning datasets (real-world and controlled data), and two task categories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

parameterlab/privacy-collapse
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.