BitSnap: Checkpoint Sparsification and Quantization in LLM Training
Yanxin Peng, Qingping Li, Baodong Wu, Shigang Li, Guohao Dai, Shengen Yan, Yu Wang

TL;DR
This paper introduces a dynamic checkpoint sparsification and quantization approach for LLM training that significantly reduces storage requirements while maintaining model accuracy, adapting to different training stages and architectures.
Contribution
It presents a novel adaptive compression method combining sparsification and quantization, optimizing storage, speed, and precision during LLM training.
Findings
16x compression ratio with no accuracy loss
2x compression ratio with minimal precision loss
Effective adaptation to various training stages and models
Abstract
As large language models (LLMs) continue to grow in size and complexity, efficient checkpoint saving\&loading has become crucial for managing storage, memory usage, and fault tolerance in LLM training. The current works do not comprehensively take into account the optimization of these several aspects. This paper proposes a novel checkpoint sparsification and quantization method that adapts dynamically to different training stages and model architectures. We present a comprehensive analysis of existing lossy and lossless compression techniques, identify current limitations, and introduce our adaptive approach that balances compression ratio, speed, and precision impact throughout the training process. Experiments on different sizes of LLMs demonstrate that our bitmask-based sparsification method achieves 16x compression ratio without compromising model accuracy. Additionally, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Advanced Data Storage Technologies · Natural Language Processing Techniques
