On Membership Inference Attacks in Knowledge Distillation
Ziyao Cui, Minxing Zhang, Jian Pei

TL;DR
This paper investigates how knowledge distillation impacts privacy risks in large language models, revealing that distillation can increase vulnerability to membership inference attacks and proposing interventions to mitigate this risk.
Contribution
It provides the first systematic evaluation of MIA vulnerability in distilled LLMs and introduces practical methods to enhance privacy without sacrificing utility.
Findings
Distilled models do not always have lower MIA success than teacher models.
Distillation can sometimes increase MIA vulnerability due to mixed supervision effects.
Proposed interventions effectively reduce MIA success while maintaining model utility.
Abstract
Large language models (LLMs) are trained on massive corpora that may contain sensitive information, creating privacy risks under membership inference attacks (MIAs). Knowledge distillation is widely used to compress LLMs into smaller student models, but its privacy implications are poorly understood. We systematically evaluate how distillation affects MIA vulnerability across six teacher-student model pairs and six attack methods. We find that distilled student models do not consistently exhibit lower MIA success than their teacher models, and in some cases demonstrate substantially higher member-specific attack success, challenging the assumption that knowledge distillation inherently improves privacy. We attribute this to mixed supervision in distillation: for vulnerable training data points, teacher predictions often align with ground-truth labels, causing student models to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)
MethodsFocus · Knowledge Distillation
