LoRDO: Distributed Low-Rank Optimization with Infrequent Communication
Andrej Jovanovi\'c, Alex Iacob, Mher Safaryan, Ionut-Vlad Modoranu, Lorenzo Sani, William F. Shen, Xinchi Qiu, Dan Alistarh, Nicholas D. Lane

TL;DR
LoRDO is a novel framework that combines low-rank optimization with infrequent communication in distributed training, significantly reducing communication costs while maintaining high performance in language models.
Contribution
LoRDO introduces a unified approach that addresses the limitations of low-rank optimizers in local-update regimes by incorporating a full-rank quasi-hyperbolic update, enabling efficient distributed training.
Findings
Achieves near-parity with low-rank DDP in language modeling.
Reduces communication by approximately 10 times.
Improves performance in low-memory settings with small rank/batch size.
Abstract
Distributed training of foundation models via is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bottlenecked by the memory and communication requirements of optimizer states. Low-rank optimizers can alleviate these constraints; however, in the local-update regime, workers lack access to the full-batch gradients required to compute low-rank projections, which degrades performance. We propose , a principled framework unifying low-rank optimization with infrequent synchronization. We first demonstrate that, while global projections based on pseudo-gradients are theoretically superior, they permanently restrict the optimization trajectory to a low-rank subspace. To restore subspace exploration, we introduce a full-rank quasi-hyperbolic update. achieves near-parity with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications
