MSQ: Memory-Efficient Bit Sparsification Quantization

Seokho Han; Seoyeon Yoon; Jinhee Kim; Dongwei Wang; Kang Eun Jeon; Huanrui Yang; Jong Hwan Ko

arXiv:2507.22349·cs.LG·July 31, 2025

MSQ: Memory-Efficient Bit Sparsification Quantization

Seokho Han, Seoyeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko

PDF

TL;DR

MSQ introduces a memory-efficient quantization method that reduces trainable parameters and training time significantly while maintaining accuracy, enabling practical deployment of deep neural networks on resource-limited devices.

Contribution

MSQ presents a novel differentiable and regularized approach to bit sparsification quantization that reduces complexity and memory usage in training deep neural networks.

Findings

01

Achieves up to 8.00x reduction in trainable parameters.

02

Reduces training time by up to 86%.

03

Maintains competitive accuracy and compression rates.

Abstract

As deep neural networks (DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and accuracy compared to uniform quantization. However, finding the optimal precision for each layer is challenging. Recent studies utilizing bit-level sparsity have shown promise, yet they often introduce substantial training complexity and high GPU memory requirements. In this paper, we propose Memory-Efficient Bit Sparsification Quantization (MSQ), a novel approach that addresses these limitations. MSQ applies a round-clamp quantizer to enable differentiable computation of the least significant bits (LSBs) from model weights. It further employs regularization to induce sparsity in these LSBs, enabling effective precision reduction without explicit bit-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.