Distillation from heterogeneous unlabeled collections
Jean-Michel Begon, Pierre Geurts

TL;DR
This paper introduces a method for distilling knowledge from large models to smaller ones using unlabeled, unrelated data by selectively sampling relevant data points and enhancing learning signals, enabling effective compression even without original training data.
Contribution
It presents a novel distillation approach leveraging heterogeneous unlabeled data, improving convergence speed and performance of smaller models.
Findings
Speeds up student model convergence
Boosts student model performance
Achieves results close to original data-based training
Abstract
Compressing deep networks is essential to expand their range of applications to constrained settings. The need for compression however often arises long after the model was trained, when the original data might no longer be available. On the other hand, unlabeled data, not necessarily related to the target task, is usually plentiful, especially in image classification tasks. In this work, we propose a scheme to leverage such samples to distill the knowledge learned by a large teacher network to a smaller student. The proposed technique relies on (i) preferentially sampling datapoints that appear related, and (ii) taking better advantage of the learning signal. We show that the former speeds up the student's convergence, while the latter boosts its performance, achieving performances closed to what can be expected with the original data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · COVID-19 diagnosis using AI
