Learning from a Teacher using Unlabeled Data

Gaurav Menghani; Sujith Ravi

arXiv:1911.05275·cs.LG·November 14, 2019·5 cites

Learning from a Teacher using Unlabeled Data

Gaurav Menghani, Sujith Ravi

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that teacher models in knowledge distillation can transfer meaningful class relationships to student models even on out-of-distribution unlabeled data, improving model performance.

Contribution

It shows that knowledge distillation with unlabeled data from different sources enhances model transferability beyond the original dataset.

Findings

01

Effective transfer of knowledge on out-of-distribution data

02

Promising results on MNIST, CIFAR-10, and Caltech-256

03

Insights into utilizing unlabeled data for model improvement

Abstract

Knowledge distillation is a widely used technique for model compression. We posit that the teacher model used in a distillation setup, captures relationships between classes, that extend beyond the original dataset. We empirically show that a teacher model can transfer this knowledge to a student model even on an {\it out-of-distribution} dataset. Using this approach, we show promising results on MNIST, CIFAR-10, and Caltech-256 datasets using unlabeled image data from different sources. Our results are encouraging and help shed further light from the perspective of understanding knowledge distillation and utilizing unlabeled data to improve model quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju-vipa/mosaickd
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation