Pruning Ternary Quantization
Dan Liu, Xi Chen, Jie Fu, Chen Ma, Xue Liu

TL;DR
This paper introduces pruning ternary quantization (PTQ), a novel method that effectively compresses deep neural networks by integrating normalization, pruning, and weight decay to produce highly accurate, low-bit ternary models with significant size reduction.
Contribution
The paper proposes a new symmetric ternary quantization method that simultaneously optimizes bit-width, model size, and accuracy, outperforming existing approaches in compression and accuracy.
Findings
Achieves 49× compression of ResNet-18 with only 4% accuracy drop.
Compresses Mask R-CNN from 170MB to 5MB with 2.8% AP drop.
Validated on various networks and tasks, demonstrating broad applicability.
Abstract
Inference time, model size, and accuracy are three key factors in deep model compression. Most of the existing work addresses these three key factors separately as it is difficult to optimize them all at the same time. For example, low-bit quantization aims at obtaining a faster model; weight sharing quantization aims at improving compression ratio and accuracy; and mixed-precision quantization aims at balancing accuracy and inference time. To simultaneously optimize bit-width, model size, and accuracy, we propose pruning ternary quantization (PTQ): a simple, effective, symmetric ternary quantization method. We integrate L2 normalization, pruning, and the weight decay term to reduce the weight discrepancy in the gradient estimator during quantization, thus producing highly compressed ternary weights. Our method brings the highest test accuracy and the highest compression ratio. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsPruning · Region Proposal Network · Pointwise Convolution · Depthwise Convolution · Batch Normalization · Depthwise Separable Convolution · Average Pooling · Inverted Residual Block · 1x1 Convolution · Weight Decay
