Quantizing deep convolutional networks for efficient inference: A   whitepaper

Raghuraman Krishnamoorthi

arXiv:1806.08342·cs.LG·June 22, 2018·758 cites

Quantizing deep convolutional networks for efficient inference: A whitepaper

Raghuraman Krishnamoorthi

PDF

Open Access 3 Repos 1 Models

TL;DR

This paper reviews techniques for quantizing convolutional neural networks to 8-bit precision, enabling significant reductions in model size and inference latency with minimal accuracy loss, and introduces tools for practical implementation.

Contribution

It provides a comprehensive overview of post-training and quantization-aware training methods, benchmarks their performance on various hardware, and offers best practices and tools for deployment.

Findings

01

8-bit quantization maintains within 2% accuracy of floating point networks

02

Quantized models achieve 2x-3x speedup on CPUs and up to 10x on specialized processors

03

Quantization-aware training reduces accuracy gap to 1% and enables lower precision with acceptable accuracy loss

Abstract

We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
mrbrownn43/YOLOv11_customized-for-RaspberryPi4
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning