On Membership Inference Attacks in Knowledge Distillation

Ziyao Cui; Minxing Zhang; Jian Pei

arXiv:2505.11837·cs.LG·January 13, 2026

On Membership Inference Attacks in Knowledge Distillation

Ziyao Cui, Minxing Zhang, Jian Pei

PDF

Open Access 1 Repo

TL;DR

This paper investigates how knowledge distillation impacts privacy risks in large language models, revealing that distillation can increase vulnerability to membership inference attacks and proposing interventions to mitigate this risk.

Contribution

It provides the first systematic evaluation of MIA vulnerability in distilled LLMs and introduces practical methods to enhance privacy without sacrificing utility.

Findings

01

Distilled models do not always have lower MIA success than teacher models.

02

Distillation can sometimes increase MIA vulnerability due to mixed supervision effects.

03

Proposed interventions effectively reduce MIA success while maintaining model utility.

Abstract

Large language models (LLMs) are trained on massive corpora that may contain sensitive information, creating privacy risks under membership inference attacks (MIAs). Knowledge distillation is widely used to compress LLMs into smaller student models, but its privacy implications are poorly understood. We systematically evaluate how distillation affects MIA vulnerability across six teacher-student model pairs and six attack methods. We find that distilled student models do not consistently exhibit lower MIA success than their teacher models, and in some cases demonstrate substantially higher member-specific attack success, challenging the assumption that knowledge distillation inherently improves privacy. We attribute this to mixed supervision in distillation: for vulnerable training data points, teacher predictions often align with ground-truth labels, causing student models to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

richardcui18/mia_in_kd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)

MethodsFocus · Knowledge Distillation