MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks
Zhiqiang Shen, Marios Savvides

TL;DR
This paper presents a simple distillation framework that significantly improves vanilla ResNet-50's accuracy on ImageNet to over 80% without using tricks or additional data, setting a new state-of-the-art baseline.
Contribution
The authors propose a straightforward distillation method that achieves over 80% accuracy on ImageNet with vanilla ResNet-50, surpassing previous methods without extra tricks or data.
Findings
ResNet-50 achieves 80.67% top-1 accuracy on ImageNet.
The method improves ResNet-18 from 69.76% to 73.19%.
The approach outperforms previous state-of-the-art results without architecture modifications.
Abstract
We introduce a simple yet effective distillation framework that is able to boost the vanilla ResNet-50 to 80%+ Top-1 accuracy on ImageNet without tricks. We construct such a framework through analyzing the problems in the existing classification system and simplify the base method ensemble knowledge distillation via discriminators by: (1) adopting the similarity loss and discriminator only on the final outputs and (2) using the average of softmax probabilities from all teacher ensembles as the stronger supervision. Intriguingly, three novel perspectives are presented for distillation: (1) weight decay can be weakened or even completely removed since the soft label also has a regularization effect; (2) using a good initialization for students is critical; and (3) one-hot/hard label is not necessary in the distillation process if the weights are well initialized. We show that such a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsKnowledge Distillation · Weight Decay · Softmax
