MSQ: Memory-Efficient Bit Sparsification Quantization
Seokho Han, Seoyeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko

TL;DR
MSQ introduces a memory-efficient quantization method that reduces trainable parameters and training time significantly while maintaining accuracy, enabling practical deployment of deep neural networks on resource-limited devices.
Contribution
MSQ presents a novel differentiable and regularized approach to bit sparsification quantization that reduces complexity and memory usage in training deep neural networks.
Findings
Achieves up to 8.00x reduction in trainable parameters.
Reduces training time by up to 86%.
Maintains competitive accuracy and compression rates.
Abstract
As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and accuracy compared to uniform quantization. However, finding the optimal precision for each layer is challenging. Recent studies utilizing bit-level sparsity have shown promise, yet they often introduce substantial training complexity and high GPU memory requirements. In this paper, we propose Memory-Efficient Bit Sparsification Quantization (MSQ), a novel approach that addresses these limitations. MSQ applies a round-clamp quantizer to enable differentiable computation of the least significant bits (LSBs) from model weights. It further employs regularization to induce sparsity in these LSBs, enabling effective precision reduction without explicit bit-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
