Automated Model Compression by Jointly Applied Pruning and Quantization
Wenting Tang, Xingxing Wei, Bo Li

TL;DR
This paper introduces AJPQ, an automated method that unifies pruning and quantization into a single process using AutoML and reinforcement learning, achieving better compression and accuracy trade-offs.
Contribution
It proposes a novel unified framework for joint pruning and quantization, simplifying the compression pipeline and improving efficiency over traditional step-wise methods.
Findings
Reduces model size by over five times with minimal accuracy loss.
Achieves twofold speedup in computation with significant size reduction.
Outperforms state-of-the-art automated compression methods.
Abstract
In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost to meet the deployment requirements. However, such a step-wise application of pruning and quantization may lead to suboptimal solutions and unnecessary time consumption. In this paper, we tackle this issue by integrating network pruning and quantization as a unified joint compression problem and then use AutoML to automatically solve it. We find the pruning process can be regarded as the channel-wise quantization with 0 bit. Thus, the separate two-step pruning and quantization can be simplified as the one-step quantization with mixed precision. This unification not only simplifies the compression pipeline but also avoids the compression divergence. To implement this idea, we propose the automated model compression by jointly applied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsPruning
