Knowledge Representing: Efficient, Sparse Representation of Prior   Knowledge for Knowledge Distillation

Junjie Liu; Dongchao Wen; Hongxing Gao; Wei Tao; Tse-Wei Chen; Kinya; Osa; Masami Kato

arXiv:1911.05329·cs.CV·November 14, 2019·1 cites

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

Junjie Liu, Dongchao Wen, Hongxing Gao, Wei Tao, Tse-Wei Chen, Kinya, Osa, Masami Kato

PDF

Open Access

TL;DR

This paper introduces a novel knowledge representation framework for knowledge distillation that models prior knowledge through parameter distribution aggregation and a sparse recoding penalty, improving performance even with limited-capacity student networks.

Contribution

It proposes a new KR framework with parameter aggregation and a sparse penalty, enhancing distillation effectiveness regardless of student network capacity.

Findings

01

Achieves state-of-the-art performance in knowledge distillation.

02

Effective even when student networks have limited capacity.

03

Compatible with other posterior-based KD methods.

Abstract

Despite the recent works on knowledge distillation (KD) have achieved a further improvement through elaborately modeling the decision boundary as the posterior knowledge, their performance is still dependent on the hypothesis that the target network has a powerful capacity (representation ability). In this paper, we propose a knowledge representing (KR) framework mainly focusing on modeling the parameters distribution as prior knowledge. Firstly, we suggest a knowledge aggregation scheme in order to answer how to represent the prior knowledge from teacher network. Through aggregating the parameters distribution from teacher network into more abstract level, the scheme is able to alleviate the phenomenon of residual accumulation in the deeper layers. Secondly, as the critical issue of what the most important prior knowledge is for better distilling, we design a sparse recoding penalty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation