Deep Model Compression Via Two-Stage Deep Reinforcement Learning
Huixin Zhan, Wei-Ming Lin, and Yongcan Cao

TL;DR
This paper introduces a two-stage deep reinforcement learning approach for CNN model compression, combining pruning and quantization to significantly reduce model size while maintaining or improving accuracy.
Contribution
It proposes a novel DRL-based framework for jointly optimizing pruning and quantization in a two-stage pipeline for CNN compression.
Findings
Achieved 9x size reduction on CIFAR-10 with slight accuracy gain.
Reduced VGG-16 size by 33x on ImageNet with no accuracy loss.
Demonstrated effectiveness on CIFAR-10 and ImageNet datasets.
Abstract
Besides accuracy, the model size of convolutional neural networks (CNN) models is another important factor considering limited hardware resources in practical applications. For example, employing deep neural networks on mobile systems requires the design of accurate yet fast CNN for low latency in classification and object detection. To fulfill the need, we aim at obtaining CNN models with both high testing accuracy and small size to address resource constraints in many embedded devices. In particular, this paper focuses on proposing a generic reinforcement learning-based model compression approach in a two-stage compression pipeline: pruning and quantization. The first stage of compression, i.e., pruning, is achieved via exploiting deep reinforcement learning (DRL) to co-learn the accuracy and the FLOPs updated after layer-wise channel pruning and element-wise variational pruning via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsPruning
