Direct Quantized Training of Language Models with Stochastic Rounding

Kaiyan Zhao; Tsuguchika Tabaru; Kenichi Kobayashi; Takumi Honda; Masafumi Yamazaki; Yoshimasa Tsuruoka

arXiv:2412.04787·cs.LG·October 13, 2025

Direct Quantized Training of Language Models with Stochastic Rounding

Kaiyan Zhao, Tsuguchika Tabaru, Kenichi Kobayashi, Takumi Honda, Masafumi Yamazaki, Yoshimasa Tsuruoka

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for training large language models directly with low-precision quantized weights using stochastic rounding, reducing memory during training while maintaining competitive performance.

Contribution

It proposes a novel approach to train quantized LLMs directly with low-precision weights without straight-through estimation, enabling memory-efficient training and inference.

Findings

01

Training with ternary weights is feasible.

02

8-bit quantization matches higher-bit performance.

03

Models are robust to lower precision and memory reduction.

Abstract

Although recent quantized Large Language Models (LLMs), such as BitNet, have paved the way for significant reduction in memory usage during deployment with binary or ternary weights, training these models still demands substantial memory footprints. This is partly because high-precision (i.e., unquantized) weights required for straight-through estimation must be maintained throughout the whole training process. To address this, we explore directly updating the quantized low-precision weights without relying on straight-through estimation during backpropagation, aiming to save memory usage during training. Specifically, we employ a stochastic rounding technique to minimize the information loss caused by the use of low-bit weights throughout training. Experimental results on our LLaMA-structured models of various sizes indicate that (1) training with only low-precision weights is feasible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KYuuto1006/DQT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling