Pyramid Vector Quantization and Bit Level Sparsity in Weights for   Efficient Neural Networks Inference

Vincenzo Liguori

arXiv:1911.10636·cs.CV·November 26, 2019·1 cites

Pyramid Vector Quantization and Bit Level Sparsity in Weights for Efficient Neural Networks Inference

Vincenzo Liguori

PDF

Open Access

TL;DR

This paper introduces Pyramid Vector Quantization (PVQ) and weight sparsity techniques to enhance CNN inference efficiency by reducing multipliers and compressing weights, demonstrated on Tiny Yolo v3.

Contribution

It presents PVQ as an effective weight quantizer that enables multiplier elimination and high sparsity, improving CNN inference efficiency.

Findings

01

PVQ produces highly sparse, compressible CNN weights

02

Multiplier elimination is achieved without performance loss

03

Demonstrated on Tiny Yolo v3 with positive results

Abstract

This paper discusses three basic blocks for the inference of convolutional neural networks (CNNs). Pyramid Vector Quantization (PVQ) is discussed as an effective quantizer for CNNs weights resulting in highly sparse and compressible networks. Properties of PVQ are exploited for the elimination of multipliers during inference while maintaining high performance. The result is then extended to any other quantized weights. The Tiny Yolo v3 CNN is used to compare such basic blocks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Neural Network Applications · Image and Signal Denoising Methods