Spot-adaptive Knowledge Distillation

Jie Song; Ying Chen; Jingwen Ye; Mingli Song

arXiv:2205.02399·cs.CV·May 6, 2022

Spot-adaptive Knowledge Distillation

Jie Song, Ying Chen, Jingwen Ye, Mingli Song

PDF

2 Repos

TL;DR

This paper introduces spot-adaptive knowledge distillation (SAKD), a method that dynamically selects the layers for distillation based on each sample and training epoch, enhancing the performance of existing distillation techniques.

Contribution

The paper proposes a novel adaptive distillation strategy that determines distillation spots per sample and epoch, improving upon fixed-spot methods and integrating seamlessly with existing distillers.

Findings

01

SAKD improves performance across 10 state-of-the-art distillers.

02

It enhances distillation in both homogeneous and heterogeneous settings.

03

Experimental results validate the effectiveness of adaptive spot selection.

Abstract

Knowledge distillation (KD) has become a well established paradigm for compressing deep neural networks. The typical way of conducting knowledge distillation is to train the student network under the supervision of the teacher network to harness the knowledge at one or multiple spots (i.e., layers) in the teacher network. The distillation spots, once specified, will not change for all the training samples, throughout the whole distillation process. In this work, we argue that distillation spots should be adaptive to training samples and distillation epochs. We thus propose a new distillation strategy, termed spot-adaptive KD (SAKD), to adaptively determine the distillation spots in the teacher network per sample, at every training iteration during the whole distillation period. As SAKD actually focuses on "where to distill" instead of "what to distill" that is widely investigated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation