Quantization-Guided Training for Compact TinyML Models
Sedigh Ghamari, Koray Ozcan, Thu Dinh, Andrey Melnikov, Juan Carvajal,, Jan Ernst, Sek Chai

TL;DR
This paper introduces a Quantization Guided Training method that effectively compresses deep neural networks to extremely low bit-precision, enabling tiny models suitable for resource-constrained environments with minimal accuracy loss.
Contribution
The paper presents a novel quantization-aware training approach that uses customized regularization to optimize low-bit-precision models and identify compression bottlenecks.
Findings
Achieved 17.7x size reduction with 2-bit precision on a tiny person detection model.
Maintained only 3% accuracy drop compared to floating-point baseline.
Validated effectiveness on state-of-the-art architectures and vision datasets.
Abstract
We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-of-the-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
