Hyperspherical Quantization: Toward Smaller and More Accurate Models

Dan Liu; Xi Chen; Chen Ma; Xue Liu

arXiv:2212.12653·cs.CV·December 27, 2022

Hyperspherical Quantization: Toward Smaller and More Accurate Models

Dan Liu, Xi Chen, Chen Ma, Xue Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces Hyperspherical Quantization, a novel ternary quantization framework that leverages hyperspherical learning to produce smaller, more accurate neural network models suitable for resource-limited devices.

Contribution

It proposes a new hyperspherical learning-based ternary quantization method that improves accuracy and reduces model size compared to existing quantization techniques.

Findings

01

Significantly improves test accuracy at similar compression levels.

02

Reduces model size by up to 40 times.

03

Enhances the bias reduction in ternary weight quantization.

Abstract

Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32 $\times$ , however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hyperspherical Quantization: Toward Smaller and More Accurate Models· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications

MethodsPruning · Test