Pruning Ternary Quantization

Dan Liu; Xi Chen; Jie Fu; Chen Ma; Xue Liu

arXiv:2107.10998·cs.CV·July 18, 2023·1 cites

Pruning Ternary Quantization

Dan Liu, Xi Chen, Jie Fu, Chen Ma, Xue Liu

PDF

Open Access

TL;DR

This paper introduces pruning ternary quantization (PTQ), a novel method that effectively compresses deep neural networks by integrating normalization, pruning, and weight decay to produce highly accurate, low-bit ternary models with significant size reduction.

Contribution

The paper proposes a new symmetric ternary quantization method that simultaneously optimizes bit-width, model size, and accuracy, outperforming existing approaches in compression and accuracy.

Findings

01

Achieves 49× compression of ResNet-18 with only 4% accuracy drop.

02

Compresses Mask R-CNN from 170MB to 5MB with 2.8% AP drop.

03

Validated on various networks and tasks, demonstrating broad applicability.

Abstract

Inference time, model size, and accuracy are three key factors in deep model compression. Most of the existing work addresses these three key factors separately as it is difficult to optimize them all at the same time. For example, low-bit quantization aims at obtaining a faster model; weight sharing quantization aims at improving compression ratio and accuracy; and mixed-precision quantization aims at balancing accuracy and inference time. To simultaneously optimize bit-width, model size, and accuracy, we propose pruning ternary quantization (PTQ): a simple, effective, symmetric ternary quantization method. We integrate L2 normalization, pruning, and the weight decay term to reduce the weight discrepancy in the gradient estimator during quantization, thus producing highly compressed ternary weights. Our method brings the highest test accuracy and the highest compression ratio. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsPruning · Region Proposal Network · Pointwise Convolution · Depthwise Convolution · Batch Normalization · Depthwise Separable Convolution · Average Pooling · Inverted Residual Block · 1x1 Convolution · Weight Decay