A Survey of Model Compression and Acceleration for Deep Neural Networks
Yu Cheng, Duo Wang, Pan Zhou, Tao Zhang

TL;DR
This survey reviews recent techniques for compressing and accelerating deep neural networks, including pruning, quantization, low-rank factorization, and knowledge distillation, to enable deployment in resource-constrained environments.
Contribution
It provides a comprehensive overview of recent methods, analysis of their performance, applications, advantages, drawbacks, and discusses future challenges and directions.
Findings
Parameter pruning and quantization effectively reduce model size.
Low-rank factorization accelerates inference with minimal accuracy loss.
Knowledge distillation transfers knowledge to smaller models.
Abstract
Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past five years, tremendous progress has been made in this area. In this paper, we review the recent techniques for compacting and accelerating DNN models. In general, these techniques are divided into four categories: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and quantization are described first, after…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Data Compression Techniques · Advanced Neural Network Applications
MethodsPruning
