VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from   Small Scale to Large Scale

Zhiwei Hao; Jianyuan Guo; Kai Han; Han Hu; Chang Xu; Yunhe Wang

arXiv:2305.15781·cs.CV·May 26, 2023·5 cites

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

Zhiwei Hao, Jianyuan Guo, Kai Han, Han Hu, Chang Xu, Yunhe Wang

PDF

Open Access 1 Repo

TL;DR

This paper revisits vanilla knowledge distillation, demonstrating its effectiveness on large-scale datasets like ImageNet-1K when combined with strong data augmentation and training strategies, achieving state-of-the-art results.

Contribution

It reveals the small data pitfall in previous KD methods and shows vanilla KD's potential in large-scale scenarios with simple techniques.

Findings

01

Vanilla KD performs strongly on large datasets like ImageNet-1K.

02

Stronger data augmentation reduces the gap between vanilla and advanced KD methods.

03

State-of-the-art accuracy achieved with vanilla KD on multiple models.

Abstract

The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results. Therefore, the reflection on the rationality of designing knowledge distillation (KD) approaches for limited-capacity architectures solely based on small-scale datasets is now deemed imperative. In this paper, we identify the \emph{small data pitfall} that presents in previous KD methods, which results in the underestimation of the power of vanilla KD framework on large-scale datasets such as ImageNet-1K. Specifically, we show that employing stronger data augmentation techniques and using larger datasets can directly decrease the gap between vanilla KD and other meticulously designed KD variants. This highlights the necessity of designing and evaluating KD approaches in the context of practical scenarios, casting off the limitations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hao840/vanillakd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Advanced Neural Network Applications · Artificial Intelligence in Healthcare and Education

MethodsKnowledge Distillation