Locally Linear Region Knowledge Distillation
Xiang Deng, Zhongfei (Mark) Zhang

TL;DR
This paper introduces a novel knowledge distillation method called L2RKD that transfers knowledge in local, linear regions to better capture the teacher's function shape, leading to improved student performance.
Contribution
L2RKD is a new approach focusing on local, linear regions for knowledge transfer, enhancing the student's ability to mimic the teacher's local function shape.
Findings
L2RKD outperforms traditional KD and state-of-the-art methods.
L2RKD shows robustness in few-shot learning scenarios.
L2RKD is compatible with existing distillation techniques for further gains.
Abstract
Knowledge distillation (KD) is an effective technique to transfer knowledge from one neural network (teacher) to another (student), thus improving the performance of the student. To make the student better mimic the behavior of the teacher, the existing work focuses on designing different criteria to align their logits or representations. Different from these efforts, we address knowledge distillation from a novel data perspective. We argue that transferring knowledge at sparse training data points cannot enable the student to well capture the local shape of the teacher function. To address this issue, we propose locally linear region knowledge distillation (RKD) which transfers the knowledge in local, linear regions from a teacher to a student. This is achieved by enforcing the student to mimic the outputs of the teacher function in local, linear regions. To the end, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsKnowledge Distillation
