Knowledge Distillation Must Account for What It Loses

Wenshuo Wang

arXiv:2604.25110·cs.LG·May 7, 2026

Knowledge Distillation Must Account for What It Loses

Wenshuo Wang

PDF

TL;DR

This paper emphasizes that knowledge distillation should evaluate not only retained task performance but also the preservation of teacher capabilities to ensure reliability.

Contribution

It introduces the concept of accounting for what is lost during distillation, proposing a taxonomy and a reporting framework for accountable distillation.

Findings

01

Current evaluation often overlooks capability losses.

02

Losses in teacher capabilities are measurable and recurring.

03

Proposed framework improves transparency in distillation outcomes.

Abstract

This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.