Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu, David Aponte, Colby Banbury, Daniel P. Robinson, Tianyu, Ding, Kazuhito Koishida, Ilya Zharkov, Tianyi Chen

TL;DR
The paper introduces GETA, an automated framework for joint structured pruning and quantization of neural networks, improving efficiency and model size without extensive hyperparameter tuning.
Contribution
GETA provides a novel, architecture-agnostic method for simultaneous pruning and quantization, addressing engineering complexity and optimization challenges.
Findings
Achieves competitive or superior compression performance.
Works effectively on CNNs and transformers.
Reduces model size while maintaining accuracy.
Abstract
Structured pruning and quantization are fundamental techniques used to reduce the size of deep neural networks (DNNs) and typically are applied independently. Applying these techniques jointly via co-optimization has the potential to produce smaller, high-quality models. However, existing joint schemes are not widely used because of (1) engineering difficulties (complicated multi-stage processes), (2) black-box optimization (extensive hyperparameter tuning to control the overall compression), and (3) insufficient architecture generalization. To address these limitations, we present the framework GETA, which automatically and efficiently performs joint structured pruning and quantization-aware training on any DNNs. GETA introduces three key innovations: (i) a quantization-aware dependency graph (QADG) that constructs a pruning search space for generic quantization-aware DNN, (ii) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsPruning
