Ps and Qs: Quantization-aware pruning for efficient low latency neural   network inference

Benjamin Hawks; Javier Duarte; Nicholas J. Fraser; Alessandro; Pappalardo; Nhan Tran; Yaman Umuroglu

arXiv:2102.11289·cs.LG·July 21, 2021

Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

Benjamin Hawks, Javier Duarte, Nicholas J. Fraser, Alessandro, Pappalardo, Nhan Tran, Yaman Umuroglu

PDF

1 Repo

TL;DR

This paper introduces quantization-aware pruning, a technique combining pruning and quantization during training to create neural networks optimized for ultra low latency inference with improved efficiency.

Contribution

The study systematically explores the interplay of pruning and quantization-aware training, demonstrating its advantages over individual techniques and other neural architecture search methods.

Findings

01

Quantization-aware pruning improves computational efficiency over pruning or quantization alone.

02

It performs comparably or better than Bayesian optimization in efficiency.

03

Network information content varies with training configurations, impacting generalizability.

Abstract

Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning, and the effect of techniques like regularization, batch normalization, and different pruning schemes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ben-hawks/pytorch-jet-classify
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning