How Is Uncertainty Propagated in Knowledge Distillation?
Ziyao Cui, Jian Pei

TL;DR
This paper investigates how uncertainty propagates in knowledge distillation, proposing variance-aware strategies to improve the stability and fidelity of student models by accounting for uncertainty.
Contribution
It introduces variance-aware methods for knowledge distillation, distinguishing inter- and intra-student uncertainty, with formal guarantees and empirical validation across models.
Findings
Standard distillation suppresses intra-student variance.
Variance-aware strategies reduce noise and hallucination in large language models.
Proposed methods improve stability and uncertainty reflection in student models.
Abstract
Knowledge distillation transfers behavior from a teacher to a student model, but the process is inherently stochastic: teacher outputs, student training, and student inference can all be random. Collapsing these uncertainties to a single point estimate can distort what is learned. We systematically study how uncertainty propagates through knowledge distillation across three representative model classes--linear regression, feed-forward neural networks, and large language models (LLMs)--and propose simple corrections. We distinguish inter-student uncertainty (variance across independently distilled students) from intra-student uncertainty (variance of a single student's predictive distribution), showing that standard single-response knowledge distillation suppresses intra-student variance while leaving substantial inter-student variability. To address these mismatches, we introduce two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics · Explainable Artificial Intelligence (XAI)
