Tree-structured Auxiliary Online Knowledge Distillation
Wenye Lin, Yangning Li, Yifeng Ding, Hai-Tao Zheng

TL;DR
This paper introduces Tree-Structured Auxiliary online knowledge distillation (TSA), a hierarchical architecture that enhances online knowledge transfer in deep models, achieving state-of-the-art results in vision and language tasks.
Contribution
The paper proposes a novel hierarchical architecture for online knowledge distillation that improves knowledge transfer effectiveness across multiple domains.
Findings
Achieves state-of-the-art performance on vision datasets
Demonstrates effectiveness in natural language processing tasks
First to apply online knowledge distillation to machine translation
Abstract
Traditional knowledge distillation adopts a two-stage training process in which a teacher model is pre-trained and then transfers the knowledge to a compact student model. To overcome the limitation, online knowledge distillation is proposed to perform one-stage distillation when the teacher is unavailable. Recent researches on online knowledge distillation mainly focus on the design of the distillation objective, including attention or gate mechanism. Instead, in this work, we focus on the design of the global architecture and propose Tree-Structured Auxiliary online knowledge distillation (TSA), which adds more parallel peers for layers close to the output hierarchically to strengthen the effect of knowledge distillation. Different branches construct different views of the inputs, which can be the source of the knowledge. The hierarchical structure implies that the knowledge transfers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
