ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks
Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah,, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

TL;DR
ReLeQ employs reinforcement learning to automate deep quantization of neural networks, significantly reducing computation and storage costs while maintaining accuracy, thus enabling faster inference on standard hardware.
Contribution
This paper introduces ReLeQ, an end-to-end reinforcement learning framework that automates the selection of quantization levels for deep neural networks, improving efficiency without accuracy loss.
Findings
Achieves less than 0.3% accuracy loss with quantization
Enables 2.2x speedup on standard hardware
Provides 2.0x energy reduction with custom accelerators
Abstract
Deep Neural Networks (DNNs) typically require massive amount of computation resource in inference tasks for computer vision applications. Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of network encodings. Recent research affirms that carefully selecting the quantization levels for each layer can preserve the accuracy while pushing the bitwidth below eight bits. However, without arduous manual effort, this deep quantization can lead to significant accuracy loss, leaving it in a position of questionable utility. As such, deep quantization opens a large hyper-parameter space (bitwidth of the layers), the exploration of which is a major challenge. We propose a systematic approach to tackle this problem, by automating the process of discovering the quantization levels through an end-to-end deep reinforcement learning framework (ReLeQ). We adapt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution · Dense Connections · LeNet
