Channel-wise Knowledge Distillation for Dense Prediction

Changyong Shu; Yifan Liu; Jianfei Gao; Zheng Yan; Chunhua Shen

arXiv:2011.13256·cs.CV·August 30, 2021

Channel-wise Knowledge Distillation for Dense Prediction

Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen

PDF

3 Repos

TL;DR

This paper introduces a novel channel-wise knowledge distillation method for semantic segmentation that aligns feature channels between teacher and student networks using KL divergence, improving performance and efficiency.

Contribution

It proposes a new channel-wise distillation approach that focuses on soft distribution alignment of feature channels, outperforming existing spatial methods in semantic segmentation.

Findings

01

Outperforms existing spatial distillation methods in semantic segmentation.

02

Requires less computational cost during training.

03

Achieves superior performance on multiple benchmarks.

Abstract

Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for dense prediction tasks align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a probabilty map using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax