Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination
Haoyang Li, Sheng Lin, Fangcheng Fu, Yuming Zhou, Xiaodong Ji, Yanfeng Zhao, Lefeng Wang, Jie Jiang, Bin Cui

TL;DR
StaleFlow is a system that improves asynchronous reinforcement learning post-training by jointly controlling data staleness and skewness, leading to higher throughput without sacrificing convergence.
Contribution
It introduces a unified approach with a global staleness protocol and redesigned architecture to address data staleness and skewness simultaneously in asynchronous RL.
Findings
Achieves 1.42-2.68× higher throughput than existing systems.
Maintains convergence while reducing data staleness and skewness.
Demonstrates effectiveness across various RL post-training scenarios.
Abstract
Reinforcement learning (RL) post-training has become pivotal for enhancing the capabilities of modern large models. A recent trend is to develop RL systems with a fully disaggregated architecture, which decouples the three RL phases (rollout, reward, and training) onto separate resources and executes them asynchronously. However, two critical data-level concerns arise: (1) asynchronous execution leads to data staleness in trajectories (the data generated by rollout) as the model parameters used in rollout may not be up to date, which impairs RL convergence; and (2) the length variation of trajectories introduces severe data skewness, leading to workload imbalance and degraded system performance. Existing systems fail to address these two concerns in a unified manner. Techniques that tightly control data staleness often constrain effective data skewness mitigation, while aggressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Age of Information Optimization
