A Note on Knowledge Distillation Loss Function for Object Classification
Defang Chen

TL;DR
This paper explores the knowledge distillation loss function in object classification, highlighting its relation to logits matching, output regularization, label smoothing, and entropy-based methods.
Contribution
It clarifies the connections between knowledge distillation and other regularization techniques, providing insights into its theoretical foundations.
Findings
Knowledge distillation acts as a form of output regularization.
It is closely related to label smoothing and entropy-based regularization.
The paper offers a theoretical perspective on the loss function's role in object classification.
Abstract
This research note provides a quick introduction to the knowledge distillation loss function used in object classification. In particular, we discuss its connection to a previously proposed logits matching loss function. We further treat knowledge distillation as a specific form of output regularization and demonstrate its connection to label smoothing and entropy-based regularization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
MethodsKnowledge Distillation · Softmax
