TL;DR
This paper introduces Adaptive Loss-aware Quantization (ALQ), a novel method for compressing neural networks into multi-bit formats that maintains accuracy while reducing bitwidth below one-bit, optimizing deployment on resource-constrained devices.
Contribution
ALQ is a new multi-bit network quantization approach that directly minimizes loss-induced errors without gradient approximation, enabling ultra-low bitwidth compression with maintained accuracy.
Findings
ALQ achieves below one-bit average bitwidth with minimal accuracy loss.
ALQ outperforms state-of-the-art methods in storage efficiency and accuracy.
Experimental results validate ALQ's effectiveness on popular image datasets.
Abstract
We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on low-resource mobile and embedded platforms. We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy. Unlike previous MBN quantization solutions that train a quantizer by minimizing the error to reconstruct full precision weights, ALQ directly minimizes the quantization-induced error on the loss function involving neither gradient approximation nor full precision maintenance. ALQ also exploits strategies including adaptive bitwidth, smooth bitwidth reduction, and iterative trained quantization to allow a smaller network size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Adaptive Loss-Aware Quantization for Multi-Bit Networks· youtube
