Adaptive Binary-Ternary Quantization
Ryan Razani, Gr\'egoire Morin, Vahid Partovi Nia, and Eyy\"ub Sari

TL;DR
This paper introduces Smart Quantization, an adaptive method combining binary and ternary quantization in neural networks, enabling single training with high accuracy on resource-constrained devices.
Contribution
The paper proposes a novel adaptive quantization method that adjusts quantization depth during training, reducing the need for multiple training runs.
Findings
Successfully adapts quantization depth during training
Maintains high accuracy on MNIST and CIFAR10
Reduces training complexity for quantized models
Abstract
Neural network models are resource hungry. It is difficult to deploy such deep networks on devices with limited resources, like smart wearables, cellphones, drones, and autonomous vehicles. Low bit quantization such as binary and ternary quantization is a common approach to alleviate this resource requirements. Ternary quantization provides a more flexible model and outperforms binary quantization in terms of accuracy, however doubles the memory footprint and increases the computational cost. Contrary to these approaches, mixed quantized models allow a trade-off between accuracy and memory footprint. In such models, quantization depth is often chosen manually, or is tuned using a separate optimization routine. The latter requires training a quantized network multiple times. Here, we propose an adaptive combination of binary and ternary quantization, namely Smart Quantization (SQ), in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
