ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural   Networks

Ahmed T. Elthakeb; Prannoy Pilligundla; FatemehSadat Mireshghallah,; Amir Yazdanbakhsh; Hadi Esmaeilzadeh

arXiv:1811.01704·cs.LG·April 17, 2020·46 cites

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah,, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

PDF

Open Access

TL;DR

ReLeQ employs reinforcement learning to automate deep quantization of neural networks, significantly reducing computation and storage costs while maintaining accuracy, thus enabling faster inference on standard hardware.

Contribution

This paper introduces ReLeQ, an end-to-end reinforcement learning framework that automates the selection of quantization levels for deep neural networks, improving efficiency without accuracy loss.

Findings

01

Achieves less than 0.3% accuracy loss with quantization

02

Enables 2.2x speedup on standard hardware

03

Provides 2.0x energy reduction with custom accelerators

Abstract

Deep Neural Networks (DNNs) typically require massive amount of computation resource in inference tasks for computer vision applications. Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of network encodings. Recent research affirms that carefully selecting the quantization levels for each layer can preserve the accuracy while pushing the bitwidth below eight bits. However, without arduous manual effort, this deep quantization can lead to significant accuracy loss, leaving it in a position of questionable utility. As such, deep quantization opens a large hyper-parameter space (bitwidth of the layers), the exploration of which is a major challenge. We propose a systematic approach to tackle this problem, by automating the process of discovering the quantization levels through an end-to-end deep reinforcement learning framework (ReLeQ). We adapt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution · Dense Connections · LeNet