Partial to Whole Knowledge Distillation: Progressive Distilling   Decomposed Knowledge Boosts Student Better

Xuanyang Zhang; Xiangyu Zhang; Jian Sun

arXiv:2109.12507·cs.CV·September 28, 2021

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better

Xuanyang Zhang, Xiangyu Zhang, Jian Sun

PDF

Open Access

TL;DR

This paper introduces PWKD, a progressive knowledge distillation method that decomposes teacher knowledge into sub-networks of increasing capacity, enabling students to learn from partial to complete knowledge and improving distillation effectiveness.

Contribution

The paper proposes a novel PWKD paradigm that reconstructs teachers into weight-sharing sub-networks to decompose knowledge, enhancing distillation by leveraging knowledge quantity.

Findings

01

PWKD improves existing distillation methods on CIFAR-100 and ImageNet.

02

Decomposed knowledge from sub-networks accelerates student learning.

03

PWKD is compatible with various offline distillation approaches.

Abstract

Knowledge distillation field delicately designs various types of knowledge to shrink the performance gap between compact student and large-scale teacher. These existing distillation approaches simply focus on the improvement of \textit{knowledge quality}, but ignore the significant influence of \textit{knowledge quantity} on the distillation procedure. Opposed to the conventional distillation approaches, which extract knowledge from a fixed teacher computation graph, this paper explores a non-negligible research direction from a novel perspective of \textit{knowledge quantity} to further improve the efficacy of knowledge distillation. We introduce a new concept of knowledge decomposition, and further put forward the \textbf{P}artial to \textbf{W}hole \textbf{K}nowledge \textbf{D}istillation~(\textbf{PWKD}) paradigm. Specifically, we reconstruct teacher into weight-sharing sub-networks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Teaching and Learning Programming · Internet of Things and AI

MethodsKnowledge Distillation