A Unified Revisit of Temperature in Classification-Based Knowledge Distillation
Logan Frank, Jim Davis

TL;DR
This paper systematically investigates how the temperature parameter in classification-based knowledge distillation interacts with training components, offering practical guidance for selecting optimal temperatures across different training setups.
Contribution
It provides a unified analysis of temperature's role in knowledge distillation, revealing its dependence on training factors and offering guidance for better temperature selection.
Findings
Temperature selection is influenced by optimizer and teacher pretraining.
Common practices like grid search may be suboptimal across different setups.
The study offers practical recommendations for temperature tuning.
Abstract
A central idea of knowledge distillation is to expose relational structure embedded in the teacher's weights for the student to learn, which is often facilitated using a temperature parameter. Despite its widespread use, there remains limited understanding on how to select an appropriate temperature value, or how this value depends on other training elements such as optimizer, teacher pretraining/finetuning, etc. In practice, temperature is commonly chosen via grid search or by adopting values from prior work, which can be time-consuming or may lead to suboptimal student performance when training setups differ. In this work, we posit that temperature is closely linked to these training components and present a unified study that systematically examines such interactions. From analyzing these cross-connections, we identify and present common situations that have a pronounced impact on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Visual and Cognitive Learning Processes · Innovative Teaching and Learning Methods
