Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding

Taowen Liu; Marta Andronic; Deniz G\"und\"uz; George A. Constantinides

arXiv:2511.00874·cs.LG·November 4, 2025

Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding

Taowen Liu, Marta Andronic, Deniz G\"und\"uz, George A. Constantinides

PDF

Open Access 1 Video

TL;DR

This paper investigates how stochastic rounding enables efficient low-bit training of large language models by analyzing its interaction with batch size and quantization effects, supported by theoretical and empirical results.

Contribution

It provides a theoretical and empirical analysis of stochastic rounding in low-bit LLM training, highlighting how batch size and quantization influence convergence and accuracy.

Findings

01

Increased batch size compensates for reduced precision during training.

02

Quantizing weights and activations affects gradient variance differently.

03

Experimental results validate the theoretical insights.

Abstract

LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy. Stochastic Rounding (SR) has emerged as a theoretically attractive alternative to deterministic rounding, offering unbiased gradient estimates. However, its interaction with other training factors -- especially batch size -- remains under explored. In this paper, we present a theoretical and empirical study of mini-batch stochastic gradient descent (SGD) with SR, showing that increased batch sizes can compensate for reduced precision during back-propagation. Furthermore, we show that quantizing weights and activations impacts gradient variance in distinct ways. Our experiments validate these theoretical insights.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis