Learning from a Teacher using Unlabeled Data
Gaurav Menghani, Sujith Ravi

TL;DR
This paper demonstrates that teacher models in knowledge distillation can transfer meaningful class relationships to student models even on out-of-distribution unlabeled data, improving model performance.
Contribution
It shows that knowledge distillation with unlabeled data from different sources enhances model transferability beyond the original dataset.
Findings
Effective transfer of knowledge on out-of-distribution data
Promising results on MNIST, CIFAR-10, and Caltech-256
Insights into utilizing unlabeled data for model improvement
Abstract
Knowledge distillation is a widely used technique for model compression. We posit that the teacher model used in a distillation setup, captures relationships between classes, that extend beyond the original dataset. We empirically show that a teacher model can transfer this knowledge to a student model even on an {\it out-of-distribution} dataset. Using this approach, we show promising results on MNIST, CIFAR-10, and Caltech-256 datasets using unlabeled image data from different sources. Our results are encouraging and help shed further light from the perspective of understanding knowledge distillation and utilizing unlabeled data to improve model quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
MethodsKnowledge Distillation
