Heterogeneous Low-Bandwidth Pre-Training of LLMs
Yazan Obeidi, Amir Sarfi, Joel Lidin, Paul Janson, Eugene Belilovsky

TL;DR
This paper explores a novel heterogeneous distributed training framework for large language models that combines low-bandwidth communication techniques with model parallelism, enabling efficient pretraining across diverse hardware setups.
Contribution
It introduces a heterogeneous training framework integrating SparseLoCo with pipeline parallelism and activation compression, improving efficiency in low-bandwidth environments.
Findings
Activation compression complements SparseLoCo at modest cost.
Selective compression improves the loss-communication tradeoff.
Heterogeneous setup enables scalable LLM pretraining with limited bandwidth.
Abstract
Pre-training large language models (LLMs) increasingly requires distributed compute, yet bandwidth constraints make it difficult to scale beyond well-provisioned datacenters-especially when model parallelism forces frequent, large inter-device communications. We study whether SparseLoCo, a low-communication data parallel method based on infrequent synchronization and sparse pseudo-gradient exchange, can be combined with low-bandwidth pipeline model parallelism via activation and activation-gradient compression. We introduce a heterogeneous distributed training framework where some participants host full replicas on high-bandwidth interconnects, while resource-limited participants are grouped to jointly instantiate a replica using pipeline parallelism with subspace-projected inter-stage communication. To make the recently introduced subspace pipeline compression compatible with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Big Data and Digital Economy
