Extracting knowledge from features with multilevel abstraction
Jinhong Lin, Zhaoyang Li

TL;DR
This paper introduces a novel self-knowledge distillation method that extracts knowledge from multilevel abstraction features, demonstrating improved performance and generalization across various tasks and models.
Contribution
The paper proposes a new SKD approach based on multilevel feature abstraction, differing from existing data augmentation and auxiliary methods.
Findings
Effective across multiple tasks
Generalizes well to different model structures
Code released on GitHub
Abstract
Knowledge distillation aims at transferring the knowledge from a large teacher model to a small student model with great improvements of the performance of the student model. Therefore, the student network can replace the teacher network to deploy on low-resource devices since the higher performance, lower number of parameters and shorter inference time. Self-knowledge distillation (SKD) attracts a great attention recently that a student model itself is a teacher model distilling knowledge from. To the best of our knowledge, self knowledge distillation can be divided into two main streams: data augmentation and refined knowledge auxiliary. In this paper, we purpose a novel SKD method in a different way from the main stream methods. Our method distills knowledge from multilevel abstraction features. Experiments and ablation studies show its great effectiveness and generalization on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Machine Learning and Data Classification · Online Learning and Analytics
MethodsKnowledge Distillation
