Neighbourhood Distillation: On the benefits of non end-to-end   distillation

La\"etitia Shao; Max Moroz; Elad Eban; Yair Movshovitz-Attias

arXiv:2010.01189·cs.LG·October 12, 2020

Neighbourhood Distillation: On the benefits of non end-to-end distillation

La\"etitia Shao, Max Moroz, Elad Eban, Yair Movshovitz-Attias

PDF

Open Access

TL;DR

This paper introduces a non end-to-end approach to knowledge distillation by splitting neural networks into smaller sub-networks, which improves training efficiency, reusability, and simplicity, especially for large models.

Contribution

It proposes a novel neighborhood distillation method that breaks away from end-to-end training, enabling parallel training, better reusability, and easier training with synthetic data.

Findings

01

Speeds up knowledge distillation through parallelism.

02

Facilitates neural architecture search by reusing neighborhoods.

03

Easier training of smaller networks with synthetic data.

Abstract

End-to-end training with back propagation is the standard method for training deep neural networks. However, as networks become deeper and bigger, end-to-end training becomes more challenging: highly non-convex models gets stuck easily in local optima, gradients signals are prone to vanish or explode during back-propagation, training requires computational resources and time. In this work, we propose to break away from the end-to-end paradigm in the context of Knowledge Distillation. Instead of distilling a model end-to-end, we propose to split it into smaller sub-networks - also called neighbourhoods - that are then trained independently. We empirically show that distilling networks in a non end-to-end fashion can be beneficial in a diverse range of use cases. First, we show that it speeds up Knowledge Distillation by exploiting parallelism and training on smaller networks. Second, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation