BitSnap: Checkpoint Sparsification and Quantization in LLM Training

Yanxin Peng; Qingping Li; Baodong Wu; Shigang Li; Guohao Dai; Shengen Yan; Yu Wang

arXiv:2511.12376·cs.LG·November 19, 2025

BitSnap: Checkpoint Sparsification and Quantization in LLM Training

Yanxin Peng, Qingping Li, Baodong Wu, Shigang Li, Guohao Dai, Shengen Yan, Yu Wang

PDF

Open Access

TL;DR

This paper introduces a dynamic checkpoint sparsification and quantization approach for LLM training that significantly reduces storage requirements while maintaining model accuracy, adapting to different training stages and architectures.

Contribution

It presents a novel adaptive compression method combining sparsification and quantization, optimizing storage, speed, and precision during LLM training.

Findings

01

16x compression ratio with no accuracy loss

02

2x compression ratio with minimal precision loss

03

Effective adaptation to various training stages and models

Abstract

As large language models (LLMs) continue to grow in size and complexity, efficient checkpoint saving\&loading has become crucial for managing storage, memory usage, and fault tolerance in LLM training. The current works do not comprehensively take into account the optimization of these several aspects. This paper proposes a novel checkpoint sparsification and quantization method that adapts dynamically to different training stages and model architectures. We present a comprehensive analysis of existing lossy and lossless compression techniques, identify current limitations, and introduce our adaptive approach that balances compression ratio, speed, and precision impact throughout the training process. Experiments on different sizes of LLMs demonstrate that our bitmask-based sparsification method achieves 16x compression ratio without compromising model accuracy. Additionally, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Advanced Data Storage Technologies · Natural Language Processing Techniques