Neural Networks with Quantization Constraints
Ignacio Hounie, Juan Elenter, Alejandro Ribeiro

TL;DR
This paper introduces a constrained optimization approach for quantization aware training of neural networks, enabling efficient low-precision models with minimal performance loss by leveraging dual variables for layer sensitivity analysis.
Contribution
It formulates quantization as a constrained optimization problem that avoids gradient approximations and uses dual variables for layer sensitivity, improving mixed precision quantization.
Findings
Competitive accuracy in image classification tasks
Layer sensitivity analysis guides effective quantization
Significant performance gains with mixed precision quantization
Abstract
Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization across layers can allow mixed precision implementations to achieve a considerably better computation performance trade-off. However, backpropagating through the quantization operation requires introducing gradient approximations, and choosing which layers to quantize is challenging for modern architectures due to the large search space. In this work, we present a constrained learning approach to quantization aware training. We formulate low precision supervised learning as a constrained optimization problem, and show that despite its non-convexity, the resulting problem is strongly dual and does away with gradient estimations. Furthermore, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttentive Walk-Aggregating Graph Neural Network
