ECO: Quantized Training without Full-Precision Master Weights

Mahdi Nikdan; Amir Zandieh; Dan Alistarh; Vahab Mirrokni

arXiv:2601.22101·cs.CL·January 30, 2026

ECO: Quantized Training without Full-Precision Master Weights

Mahdi Nikdan, Amir Zandieh, Dan Alistarh, Vahab Mirrokni

PDF

Open Access

TL;DR

This paper introduces ECO, a novel quantized training method that removes the need for high-precision master weights by applying updates directly to quantized parameters, reducing memory usage while maintaining accuracy.

Contribution

ECO is the first optimizer that eliminates master weights in quantized training by error feedback, enabling memory-efficient training of large models without accuracy loss.

Findings

01

ECO matches baseline accuracy with reduced memory overhead.

02

ECO enables training of large models with quantization up to INT4.

03

Theoretical proof of convergence under standard assumptions.

Abstract

Quantization has significantly improved the compute and memory efficiency of Large Language Model (LLM) training. However, existing approaches still rely on accumulating their updates in high-precision: concretely, gradient updates must be applied to a high-precision weight buffer, known as $master weights$ . This buffer introduces substantial memory overhead, particularly for Sparse Mixture of Experts (SMoE) models, where model parameters and optimizer states dominate memory usage. To address this, we introduce the Error-Compensating Optimizer (ECO), which eliminates master weights by applying updates directly to quantized parameters. ECO quantizes weights after each step and carefully injects the resulting quantization error into the optimizer momentum, forming an error-feedback loop with no additional memory. We prove that, under standard assumptions and a decaying learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education