A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation
Ziwei Liu, Yongtao Wang, Xiaojie Chu

TL;DR
This paper introduces a simple, generic framework for feature distillation that aligns teacher and student features along the channel dimension using a learnable nonlinear transformation, improving performance across various vision tasks.
Contribution
It proposes a novel channel-wise feature alignment method with a learnable transformation, enhancing distillation effectiveness and versatility in multiple computer vision applications.
Findings
Improves image classification accuracy by 3.28% on ImageNet.
Enhances object detection mAP by 3.9% on MS COCO.
Boosts semantic segmentation mIoU by 4.66% on Cityscapes.
Abstract
Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Infrastructure Maintenance and Monitoring
MethodsDepthwise Convolution · Global Average Pooling · Auxiliary Classifier · Average Pooling · Softmax · Batch Normalization · Dilated Convolution · Pointwise Convolution · Convolution · Depthwise Separable Convolution
