Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training
Myeonghwan Ahn, Sungjoo Yoo

TL;DR
This paper introduces Gaussian weight sampling for pseudo-quantization training, enabling scalable, efficient, and stable low-precision training of large language models with minimal computational overhead.
Contribution
It proposes a novel Gaussian weight sampling method for pseudo-quantization training that improves scalability, efficiency, and stability in training large language models.
Findings
Supports low-precision FP parameters down to FP6
Incurred only 1.40% overhead on A100 GPU
Achieved stable training surpassing BF16 baseline on GPT2 and Llama2
Abstract
Ever-growing scale of large language models (LLMs) is pushing for improved efficiency, favoring fully quantized training (FQT) over BF16. While FQT accelerates training, it faces consistency challenges and requires searching over an exponential number of cases, each needing over 200B tokens to ensure stability. Pseudo-quantization training (PQT) addresses the issues of FQT, although it is not well-studied. We explore the practical implications of PQT in detail and propose a noise distribution that is floating-point (FP)-friendly, with ideal properties including stochastic precision annealing. As a result, the proposed method serves as an effective theoretical foundation for low-precision FP parameters through PQT, utilizing efficient fake quantization via an addition and subsequent FP casting. We demonstrate that Gaussian weight sampling is (1) scalable: supports low-precision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Neural Network Applications · Natural Language Processing Techniques
