LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

Andrej Jovanovi\'c; Alex Iacob; Mher Safaryan; Ionut-Vlad Modoranu; Lorenzo Sani; William F. Shen; Xinchi Qiu; Dan Alistarh; Nicholas D. Lane

arXiv:2602.04396·cs.LG·February 5, 2026

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

Andrej Jovanovi\'c, Alex Iacob, Mher Safaryan, Ionut-Vlad Modoranu, Lorenzo Sani, William F. Shen, Xinchi Qiu, Dan Alistarh, Nicholas D. Lane

PDF

Open Access

TL;DR

LoRDO is a novel framework that combines low-rank optimization with infrequent communication in distributed training, significantly reducing communication costs while maintaining high performance in language models.

Contribution

LoRDO introduces a unified approach that addresses the limitations of low-rank optimizers in local-update regimes by incorporating a full-rank quasi-hyperbolic update, enabling efficient distributed training.

Findings

01

Achieves near-parity with low-rank DDP in language modeling.

02

Reduces communication by approximately 10 times.

03

Improves performance in low-memory settings with small rank/batch size.

Abstract

Distributed training of foundation models via $DDP$ is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bottlenecked by the memory and communication requirements of optimizer states. Low-rank optimizers can alleviate these constraints; however, in the local-update regime, workers lack access to the full-batch gradients required to compute low-rank projections, which degrades performance. We propose $LoRDO$ , a principled framework unifying low-rank optimization with infrequent synchronization. We first demonstrate that, while global projections based on pseudo-gradients are theoretically superior, they permanently restrict the optimization trajectory to a low-rank subspace. To restore subspace exploration, we introduce a full-rank quasi-hyperbolic update. $LoRDO$ achieves near-parity with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications