Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution
Hadi Pouransari, Mojan Javaheripi, Vinay Sharma, Oncel Tuzel

TL;DR
This paper introduces extracurricular learning, a novel knowledge distillation method that significantly reduces the accuracy gap between teacher and student models by modeling output distributions and sampling from an extended data set.
Contribution
It proposes a new knowledge distillation technique that improves student model accuracy by incorporating uncertain samples and modeling output distributions.
Findings
Reduces accuracy gap by 46% to 68%
Achieves 16% regression error reduction on MPIIGaze
Improves top-1 classification accuracy on CIFAR100 and ImageNet
Abstract
Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples. We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%. This leads to major accuracy improvements compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
