Review helps learn better: Temporal Supervised Knowledge Distillation
Dongwei Wang, Zhi Han, Yanmei Wang, Xiai Chen, Baichen Liu, Yandong Tang

TL;DR
This paper introduces Temporal Supervised Knowledge Distillation (TSKD), a method that leverages the temporal evolution of feature maps during training to improve neural network performance across tasks.
Contribution
We propose TSKD, which uses Conv-LSTM to extract temporal features and trains the student network with dynamic targets, enhancing knowledge refinement during training.
Findings
TSKD outperforms existing knowledge distillation methods.
Effective across various architectures and tasks.
Improves training efficiency and accuracy.
Abstract
Reviewing plays an important role when learning knowledge. The knowledge acquisition at a certain time point may be strongly inspired with the help of previous experience. Thus the knowledge growing procedure should show strong relationship along the temporal dimension. In our research, we find that during the network training, the evolution of feature map follows temporal sequence property. A proper temporal supervision may further improve the network training performance. Inspired by this observation, we propose Temporal Supervised Knowledge Distillation (TSKD). Specifically, we extract the spatiotemporal features in the different training phases of student by convolutional Long Short-term memory network (Conv-LSTM). Then, we train the student net through a dynamic target, rather than static teacher network features. This process realizes the refinement of old knowledge in student…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation · Memory Network
