Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training

Myeonghwan Ahn; Sungjoo Yoo

arXiv:2505.11170·cs.LG·May 19, 2025

Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training

Myeonghwan Ahn, Sungjoo Yoo

PDF

Open Access

TL;DR

This paper introduces Gaussian weight sampling for pseudo-quantization training, enabling scalable, efficient, and stable low-precision training of large language models with minimal computational overhead.

Contribution

It proposes a novel Gaussian weight sampling method for pseudo-quantization training that improves scalability, efficiency, and stability in training large language models.

Findings

01

Supports low-precision FP parameters down to FP6

02

Incurred only 1.40% overhead on A100 GPU

03

Achieved stable training surpassing BF16 baseline on GPT2 and Llama2

Abstract

Ever-growing scale of large language models (LLMs) is pushing for improved efficiency, favoring fully quantized training (FQT) over BF16. While FQT accelerates training, it faces consistency challenges and requires searching over an exponential number of cases, each needing over 200B tokens to ensure stability. Pseudo-quantization training (PQT) addresses the issues of FQT, although it is not well-studied. We explore the practical implications of PQT in detail and propose a noise distribution $R$ that is floating-point (FP)-friendly, with ideal properties including stochastic precision annealing. As a result, the proposed method serves as an effective theoretical foundation for low-precision FP parameters through PQT, utilizing efficient fake quantization via an addition and subsequent FP casting. We demonstrate that Gaussian weight sampling is (1) scalable: supports low-precision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Advanced Neural Network Applications · Natural Language Processing Techniques