Automatic Joint Structured Pruning and Quantization for Efficient Neural   Network Training and Compression

Xiaoyi Qu; David Aponte; Colby Banbury; Daniel P. Robinson; Tianyu; Ding; Kazuhito Koishida; Ilya Zharkov; Tianyi Chen

arXiv:2502.16638·cs.LG·February 25, 2025

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Xiaoyi Qu, David Aponte, Colby Banbury, Daniel P. Robinson, Tianyu, Ding, Kazuhito Koishida, Ilya Zharkov, Tianyi Chen

PDF

Open Access 1 Repo

TL;DR

The paper introduces GETA, an automated framework for joint structured pruning and quantization of neural networks, improving efficiency and model size without extensive hyperparameter tuning.

Contribution

GETA provides a novel, architecture-agnostic method for simultaneous pruning and quantization, addressing engineering complexity and optimization challenges.

Findings

01

Achieves competitive or superior compression performance.

02

Works effectively on CNNs and transformers.

03

Reduces model size while maintaining accuracy.

Abstract

Structured pruning and quantization are fundamental techniques used to reduce the size of deep neural networks (DNNs) and typically are applied independently. Applying these techniques jointly via co-optimization has the potential to produce smaller, high-quality models. However, existing joint schemes are not widely used because of (1) engineering difficulties (complicated multi-stage processes), (2) black-box optimization (extensive hyperparameter tuning to control the overall compression), and (3) insufficient architecture generalization. To address these limitations, we present the framework GETA, which automatically and efficiently performs joint structured pruning and quantization-aware training on any DNNs. GETA introduces three key innovations: (i) a quantization-aware dependency graph (QADG) that constructs a pruning search space for generic quantization-aware DNN, (ii) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/geta
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsPruning