Locally Linear Region Knowledge Distillation

Xiang Deng; Zhongfei (Mark) Zhang

arXiv:2010.04812·cs.LG·October 20, 2020

Locally Linear Region Knowledge Distillation

Xiang Deng, Zhongfei (Mark) Zhang

PDF

Open Access

TL;DR

This paper introduces a novel knowledge distillation method called L2RKD that transfers knowledge in local, linear regions to better capture the teacher's function shape, leading to improved student performance.

Contribution

L2RKD is a new approach focusing on local, linear regions for knowledge transfer, enhancing the student's ability to mimic the teacher's local function shape.

Findings

01

L2RKD outperforms traditional KD and state-of-the-art methods.

02

L2RKD shows robustness in few-shot learning scenarios.

03

L2RKD is compatible with existing distillation techniques for further gains.

Abstract

Knowledge distillation (KD) is an effective technique to transfer knowledge from one neural network (teacher) to another (student), thus improving the performance of the student. To make the student better mimic the behavior of the teacher, the existing work focuses on designing different criteria to align their logits or representations. Different from these efforts, we address knowledge distillation from a novel data perspective. We argue that transferring knowledge at sparse training data points cannot enable the student to well capture the local shape of the teacher function. To address this issue, we propose locally linear region knowledge distillation ( $L^{2}$ RKD) which transfers the knowledge in local, linear regions from a teacher to a student. This is achieved by enforcing the student to mimic the outputs of the teacher function in local, linear regions. To the end, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsKnowledge Distillation