Be Your Own Teacher: Improve the Performance of Convolutional Neural   Networks via Self Distillation

Linfeng Zhang; Jiebo Song; Anni Gao; Jingwei Chen; Chenglong Bao,; Kaisheng Ma

arXiv:1905.08094·cs.LG·May 21, 2019·83 cites

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao,, Kaisheng Ma

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self distillation framework that improves CNN accuracy by internal knowledge transfer, enabling smaller networks to perform better without increasing their size or complexity.

Contribution

The proposed self distillation method distills knowledge within a network, enhancing accuracy while reducing model size, unlike traditional external teacher-student distillation approaches.

Findings

01

Average accuracy improvement of 2.65% across tested models

02

Maximum improvement of 4.07% in VGG19

03

Enables scalable inference on resource-limited devices

Abstract

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy through either deeper or wider network structures, which brings with them the exponential increment of the computational and storage cost, delaying the responding time. In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural networks through shrinking the size of the network rather than aggrandizing it. Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luanyunteng/pytorch-be-your-own-teacher
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsKnowledge Distillation · Average Pooling · ResNeXt Block · Grouped Convolution · Global Average Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Kaiming Initialization · 1x1 Convolution · Convolution