MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
Laingjun Feng, Chenyi Pan, Xinjie Guo, Fei Mei, Benzhe Ning, Jianxiang Zhang, Xinyang Liu, Beirong Zhou, Zeng Shu, Chang Liu, Guang Yang, Zhenyu Han, Jiangben Wang, Bo Wang

TL;DR
MindSpeed RL introduces a distributed dataflow system for scalable and efficient reinforcement learning training on Ascend NPU clusters, significantly improving throughput and resource utilization.
Contribution
The paper presents a novel distributed dataflow architecture for RL training that reduces overhead and memory usage, enabling large-scale training on NPU clusters.
Findings
Achieved 1.42 to 3.97 times throughput increase over state-of-the-art systems.
Effectively reduced dispatch overhead and redundant memory in RL training.
Demonstrated strong performance on large models using Ascend NPUs.
Abstract
Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents dataflow between nodes. Owing to the heavy cross-node dependencies, the RL training system usually suffers from poor cluster scalability and low memory utilization. In this article, we introduce MindSpeed RL, an effective and efficient system for large-scale RL training. Unlike existing centralized methods, MindSpeed RL organizes the essential data dependencies in RL training, i.e., sample flow and resharding flow, from a distributed view. On the one hand, a distributed transfer dock strategy, which sets controllers and warehouses on the basis of the conventional replay buffer, is designed to release the dispatch overhead in the sample flow. A practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Reinforcement Learning in Robotics · IoT and Edge/Fog Computing
