Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective
Jinjing Zhu, Songze Li, Lin Wang

TL;DR
This paper redefines knowledge in distillation by incorporating relationships between samples and their in-context neighbors, leading to a novel framework that improves performance across various KD paradigms.
Contribution
It introduces a new in-context knowledge distillation framework that leverages sample relationships via retrieval, with theoretical analysis and state-of-the-art results.
Findings
IC-KD outperforms existing methods on CIFAR-100 and ImageNet.
Theoretical analysis confirms the importance of in-context sample knowledge.
Framework is effective across offline, online, and teacher-free KD.
Abstract
Conventional knowledge distillation (KD) approaches are designed for the student model to predict similar output as the teacher model for each sample. Unfortunately, the relationship across samples with same class is often neglected. In this paper, we explore to redefine the knowledge in distillation, capturing the relationship between each sample and its corresponding in-context samples (a group of similar samples with the same or different classes), and perform KD from an in-context sample retrieval perspective. As KD is a type of learned label smoothing regularization (LSR), we first conduct a theoretical analysis showing that the teacher's knowledge from the in-context samples is a crucial contributor to regularize the student training with the corresponding samples. Buttressed by the analysis, we propose a novel in-context knowledge distillation (IC-KD) framework that shows its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsLabel Smoothing · Knowledge Distillation
