Universal Deep Neural Network Compression
Yoojin Choi, Mostafa El-Khamy, Jungwon Lee

TL;DR
This paper introduces a universal approach to compress deep neural networks using vector quantization and source coding, achieving high compression ratios without prior knowledge of weight distributions.
Contribution
It pioneers universal DNN compression with randomized lattice quantization and fine-tuning, outperforming previous non-universal methods.
Findings
Achieved 47.1x compression for ResNet on CIFAR-10.
Achieved 42.5x compression for AlexNet on ImageNet.
Demonstrated near-optimal performance without distribution knowledge.
Abstract
In this paper, we investigate lossy compression of deep neural networks (DNNs) by weight quantization and lossless source coding for memory-efficient deployment. Whereas the previous work addressed non-universal scalar quantization and entropy coding of DNN weights, we for the first time introduce universal DNN compression by universal vector quantization and universal source coding. In particular, we examine universal randomized lattice quantization of DNNs, which randomizes DNN weights by uniform random dithering before lattice quantization and can perform near-optimally on any source without relying on knowledge of its probability distribution. Moreover, we present a method of fine-tuning vector quantized DNNs to recover the performance loss after quantization. Our experimental results show that the proposed universal DNN compression scheme compresses the 32-layer ResNet (trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
