Permute, Quantize, and Fine-tune: Efficient Compression of Neural   Networks

Julieta Martinez; Jashan Shewakramani; Ting Wei Liu; Ioan Andrei; B\^arsan; Wenyuan Zeng; Raquel Urtasun

arXiv:2010.15703·cs.CV·April 13, 2021

Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks

Julieta Martinez, Jashan Shewakramani, Ting Wei Liu, Ioan Andrei, B\^arsan, Wenyuan Zeng, Raquel Urtasun

PDF

1 Repo

TL;DR

This paper introduces a permutation-based approach combined with quantization and fine-tuning to improve neural network compression, especially for modern architectures, achieving significant accuracy retention across vision tasks.

Contribution

It proposes a novel permutation strategy for weights that enhances vector quantization efficiency, connecting to rate-distortion theory for better compression.

Findings

01

Achieves 40-70% gap reduction with uncompressed models

02

Improves compression for pointwise convolutions and linear layers

03

Enhances accuracy with annealed quantization

Abstract

Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single code, and has recently achieved state-of-the-art network compression on a range of core vision and natural language processing tasks. Key to the success of vector quantization is deciding which parameter groups should be compressed together. Previous work has relied on heuristics that group the spatial dimension of individual convolutional filters, but a general solution remains unaddressed. This is desirable for pointwise convolutions (which dominate modern architectures), linear layers (which have no notion of spatial dimension), and convolutions (when more than one filter is compressed to the same codeword). In this paper we make the observation that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uber-research/permute-quantize-finetune
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.