Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity
Di Zhang, Xun Wu, Shaohan Huang, Yudong Wang, Hanyong Shao, Yingbo Hao, Zewen Chi, Li Dong, Ting Song, Yan Xia, Zhifang Sui, Furu Wei

TL;DR
Sparse-BitNet demonstrates that combining 1.58-bit quantization with N:M sparsity enhances efficiency and stability in large language models, enabling faster training and inference with minimal performance loss.
Contribution
This work introduces Sparse-BitNet, a unified framework that jointly applies ultra-low-bit quantization and semi-structured sparsity, ensuring stable training and improved efficiency for LLMs.
Findings
1.58-bit BitNet outperforms full-precision models at the same sparsity levels.
Sparse-BitNet achieves up to 1.30X speedup in training and inference.
Combining low-bit quantization with N:M sparsity is highly compatible and beneficial.
Abstract
Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
