MSRL: Distributed Reinforcement Learning with Dataflow Fragments
Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi,, Yaodong Yang, Peter Pietzuch, Lei Chen

TL;DR
MSRL introduces a flexible distributed reinforcement learning system that decouples algorithms from execution strategies using dataflow fragments, enabling scalable training across large GPU clusters without modifying algorithms.
Contribution
The paper presents MSRL, a novel system that uses dataflow fragments to abstractly and flexibly distribute RL training across clusters, surpassing existing hard-coded strategies.
Findings
Supports distribution policies without algorithm changes
Scales RL training to 64 GPUs
Subsumes existing distribution strategies
Abstract
Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e.g. policy network updates) on GPU workers. Fundamentally, current systems lack abstractions that decouple RL algorithms from their execution. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices
