Model compression via distillation and quantization

Antonio Polino; Razvan Pascanu; Dan Alistarh

arXiv:1802.05668·cs.NE·February 16, 2018·262 cites

Model compression via distillation and quantization

Antonio Polino, Razvan Pascanu, Dan Alistarh

PDF

Open Access 5 Repos

TL;DR

This paper introduces two novel methods for compressing deep neural networks by combining quantization and distillation, enabling efficient deployment on resource-limited devices without significant accuracy loss.

Contribution

The paper proposes quantized distillation and differentiable quantization, new techniques that jointly optimize weight quantization and knowledge transfer from larger models.

Findings

01

Shallow quantized students achieve similar accuracy to full models

02

Order of magnitude compression with linear speedup

03

Effective deployment in resource-constrained environments

Abstract

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, which jointly leverage weight quantization and distillation of larger teacher networks into smaller student networks. The first method we propose is called quantized distillation and leverages distillation during the training process, by incorporating distillation loss, expressed with respect to the teacher, into the training of a student network whose weights are quantized to a limited set of levels. The second method, differentiable quantization, optimizes the location of quantization points…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning