Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better
Xuanyang Zhang, Xiangyu Zhang, Jian Sun

TL;DR
This paper introduces PWKD, a progressive knowledge distillation method that decomposes teacher knowledge into sub-networks of increasing capacity, enabling students to learn from partial to complete knowledge and improving distillation effectiveness.
Contribution
The paper proposes a novel PWKD paradigm that reconstructs teachers into weight-sharing sub-networks to decompose knowledge, enhancing distillation by leveraging knowledge quantity.
Findings
PWKD improves existing distillation methods on CIFAR-100 and ImageNet.
Decomposed knowledge from sub-networks accelerates student learning.
PWKD is compatible with various offline distillation approaches.
Abstract
Knowledge distillation field delicately designs various types of knowledge to shrink the performance gap between compact student and large-scale teacher. These existing distillation approaches simply focus on the improvement of \textit{knowledge quality}, but ignore the significant influence of \textit{knowledge quantity} on the distillation procedure. Opposed to the conventional distillation approaches, which extract knowledge from a fixed teacher computation graph, this paper explores a non-negligible research direction from a novel perspective of \textit{knowledge quantity} to further improve the efficacy of knowledge distillation. We introduce a new concept of knowledge decomposition, and further put forward the \textbf{P}artial to \textbf{W}hole \textbf{K}nowledge \textbf{D}istillation~(\textbf{PWKD}) paradigm. Specifically, we reconstruct teacher into weight-sharing sub-networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Teaching and Learning Programming · Internet of Things and AI
MethodsKnowledge Distillation
