Moonshine: Distilling with Cheap Convolutions

Elliot J. Crowley; Gavin Gray; Amos Storkey

arXiv:1711.02613·stat.ML·January 18, 2019·49 cites

Moonshine: Distilling with Cheap Convolutions

Elliot J. Crowley, Gavin Gray, Amos Storkey

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for reducing neural network memory usage through structural distillation, creating a student model that is a simple transformation of the teacher, with minimal accuracy loss.

Contribution

It presents a novel structural distillation approach that simplifies the creation of memory-efficient models without redesigning architectures or tuning hyperparameters.

Findings

01

Significant memory savings with minimal accuracy loss.

02

Distilled models outperform directly trained students.

03

Pareto analysis of memory versus accuracy trade-offs.

Abstract

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BayesWatch/pytorch-moonshine
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection