A Unified Framework of DNN Weight Pruning and Weight   Clustering/Quantization Using ADMM

Shaokai Ye; Tianyun Zhang; Kaiqi Zhang; Jiayu Li; Jiaming Xie; Yun; Liang; Sijia Liu; Xue Lin; Yanzhi Wang

arXiv:1811.01907·cs.NE·November 6, 2018·43 cites

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Shaokai Ye, Tianyun Zhang, Kaiqi Zhang, Jiayu Li, Jiaming Xie, Yun, Liang, Sijia Liu, Xue Lin, Yanzhi Wang

PDF

Open Access

TL;DR

This paper introduces a unified ADMM-based framework for DNN weight pruning and clustering/quantization, significantly improving model compression while maintaining accuracy.

Contribution

It develops a systematic, unified optimization framework combining weight pruning and clustering/quantization using ADMM, enabling joint solutions and enhanced compression.

Findings

01

Achieves up to 167x weight reduction in LeNet-5 without accuracy loss.

02

Reaches 1,910x storage reduction in LeNet-5 with combined pruning and quantization.

03

Demonstrates significant improvements over existing methods in model compression.

Abstract

Many model compression techniques of Deep Neural Networks (DNNs) have been investigated, including weight pruning, weight clustering and quantization, etc. Weight pruning leverages the redundancy in the number of weights in DNNs, while weight clustering/quantization leverages the redundancy in the number of bit representations of weights. They can be effectively combined in order to exploit the maximum degree of redundancy. However, there lacks a systematic investigation in literature towards this direction. In this paper, we fill this void and develop a unified, systematic framework of DNN weight pruning and clustering/quantization using Alternating Direction Method of Multipliers (ADMM), a powerful technique in optimization theory to deal with non-convex optimization problems. Both DNN weight pruning and clustering/quantization, as well as their combinations, can be solved in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning

MethodsPruning · 1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax