Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL
Erfan Miahi, Eugene Belilovsky

TL;DR
This paper introduces PULSE, a novel approach leveraging weight update sparsity in distributed RL to drastically reduce communication costs while maintaining model fidelity.
Contribution
It proposes compute-visible sparsification and two algorithms, PULSESync and PULSELoCo, to significantly improve communication efficiency in distributed RL training.
Findings
PULSESync reduces weight-synchronization communication by over 100x.
PULSELoCo decreases trainer-to-trainer communication by over 17x.
Approximately 99% of weight updates are invisible after BF16 casting, enabling sparsification.
Abstract
Bandwidth-constrained distributed reinforcement learning (RL) post-training of large language models is bottlenecked by two channels: weight synchronization from trainers to inference workers, and gradient or pseudo-gradient synchronization across trainers. We find that approximately 99% of per-step weight updates are invisible after the BF16 cast used by standard training and inference forward passes. We explain this sparsity by showing that, at typical RL post-training learning rates, Adam updates often fall below the local BF16 rounding threshold. We turn this observation into an algorithmic principle called compute-visible sparsification: transmit only updates that would change the next forward pass. PULSE (Precision-gated Updates for Low-precision Sparse Exchange) turns this principle into two communication algorithms: PULSESync sends lossless sparse BF16 weight patches from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Speech Recognition and Synthesis · Advanced Neural Network Applications
