Distilling the Knowledge in a Neural Network

Geoffrey Hinton; Oriol Vinyals; Jeff Dean

arXiv:1503.02531·stat.ML·March 10, 2015·14k cites

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, Jeff Dean

PDF

Open Access 5 Repos 10 Models 1 Datasets 1 Video

TL;DR

This paper introduces a method to compress ensemble models into a single neural network, improving deployment efficiency and performance on tasks like MNIST and acoustic modeling.

Contribution

It develops a novel knowledge distillation technique that effectively transfers ensemble knowledge into a single model, including a new ensemble structure with specialist models.

Findings

01

Achieved improved performance on MNIST

02

Enhanced acoustic model accuracy

03

Demonstrated rapid training of specialist models

Abstract

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Aneerudh/parsed_arxiv_cs_papers
dataset· 11 dl
11 dl

Videos

Distilling the Knowledge in a Neural Network· youtube

Taxonomy

TopicsNeural Networks and Applications · Time Series Analysis and Forecasting · Topic Modeling

MethodsKnowledge Distillation