A Survey of Model Compression and Acceleration for Deep Neural Networks

Yu Cheng; Duo Wang; Pan Zhou; Tao Zhang

arXiv:1710.09282·cs.LG·June 16, 2020·882 cites

A Survey of Model Compression and Acceleration for Deep Neural Networks

Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang

PDF

Open Access

TL;DR

This survey reviews recent techniques for compressing and accelerating deep neural networks, including pruning, quantization, low-rank factorization, and knowledge distillation, to enable deployment in resource-constrained environments.

Contribution

It provides a comprehensive overview of recent methods, analysis of their performance, applications, advantages, drawbacks, and discusses future challenges and directions.

Findings

01

Parameter pruning and quantization effectively reduce model size.

02

Low-rank factorization accelerates inference with minimal accuracy loss.

03

Knowledge distillation transfers knowledge to smaller models.

Abstract

Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past five years, tremendous progress has been made in this area. In this paper, we review the recent techniques for compacting and accelerating DNN models. In general, these techniques are divided into four categories: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and quantization are described first, after…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Advanced Data Compression Techniques · Advanced Neural Network Applications

MethodsPruning