NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Jari Kolehmainen; Nikolay Blagoev; John Donaghy; O\u{g}uzhan Ersoy; Christopher Nies

arXiv:2506.10911·cs.LG·June 13, 2025

NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Jari Kolehmainen, Nikolay Blagoev, John Donaghy, O\u{g}uzhan Ersoy, Christopher Nies

PDF

Open Access 1 Repo

TL;DR

NoLoCo introduces a novel training method for large models that eliminates explicit synchronization, reducing communication overhead and improving convergence speed across various model sizes and cluster scales.

Contribution

It proposes a new optimizer that implicitly synchronizes model weights without collective communication, enabling efficient training on low-bandwidth networks.

Findings

01

Requires significantly less communication overhead than existing methods.

02

Achieves up to 4% faster convergence rate.

03

Effective across a wide range of model sizes and accelerator counts.

Abstract

Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive, avoiding the need for a highly connected compute cluster. These state-of-the-art low communication training methods still employ a synchronization step for model parameters, which, when performed over all model replicas, can become costly on a low-bandwidth network. In this work, we propose a novel optimization method, NoLoCo, that does not explicitly synchronize all model parameters during training and, as a result, does not require any collective communication. NoLoCo implicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gensyn-ai/noloco
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis