Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation
Sajjad Abbasi, Mohsen Hajabdollahi, Nader Karimi, Shadrokh Samavi

TL;DR
This paper proposes a comprehensive model to unify and analyze various knowledge distillation techniques in deep neural networks, facilitating better understanding and development of new methods.
Contribution
It introduces a general model that encapsulates existing KD techniques, enabling systematic investigation and comparison of different approaches.
Findings
The model effectively summarizes diverse KD methods.
It highlights advantages and disadvantages of existing techniques.
Provides a framework for developing new KD strategies.
Abstract
Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one. The typical application of KD is in the form of learning a small model (named as a student) by soft labels produced by a complex model (named as a teacher). Due to the novel idea introduced in KD, recently, its notion is used in different methods such as compression and processes that are going to enhance the model accuracy. Although different techniques are proposed in the area of KD, there is a lack of a model to generalize KD techniques. In this paper, various studies in the scope of KD are investigated and analyzed to build a general model for KD. All the methods and techniques in KD can be summarized through the proposed model. By utilizing the proposed model, different methods in KD are better investigated and explored. The advantages and disadvantages of different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
