Improving Knowledge Distillation via Regularizing Feature Norm and   Direction

Yuzhu Wang; Lechao Cheng; Manni Duan; Yongheng Wang; Zunlei Feng; Shu; Kong

arXiv:2305.17007·cs.CV·May 29, 2023·5 cites

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Yuzhu Wang, Lechao Cheng, Manni Duan, Yongheng Wang, Zunlei Feng, Shu, Kong

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel regularization technique called ND loss for knowledge distillation that aligns feature norms and directions, significantly improving student model performance across multiple benchmarks.

Contribution

It proposes a simple yet effective ND loss that encourages large feature norms and directional alignment between student and teacher features, advancing knowledge distillation methods.

Findings

01

Achieves state-of-the-art results on ImageNet, CIFAR100, and COCO datasets.

02

Improves classification accuracy and detection precision with the proposed techniques.

03

Enhances existing KD methods by incorporating feature norm and direction regularization.

Abstract

Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task. Treating teacher features as knowledge, prevailing methods of knowledge distillation train student by aligning its features with the teacher's, e.g., by minimizing the KL-divergence between their logits or L2 distance between their intermediate features. While it is natural to believe that better alignment of student features to the teacher better distills teacher knowledge, simply forcing this alignment does not directly contribute to the student's performance, e.g., classification accuracy. In this work, we propose to align student features with class-mean of teacher features, where class-mean naturally serves as a strong classifier. To this end, we explore baseline techniques such as adopting the cosine distance based loss to encourage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangyz1608/knowledge-distillation-via-nd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsPruning · ALIGN · Knowledge Distillation