DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training
Tianhao Hu, Xiangcheng Liu, Youshao Xiao, Yang Zheng, Xuan Huang, Jinrui Ding, Yufei Zhang, Tao Liang, Hongyu Zang, Quan Chen, Yueqing Sun, Wenjie Shi, Chao Zhang, Wei Wang, Qi Gu, Yerui Sun, Yucheng Xie, Xunliang Cai

TL;DR
DORA is a scalable asynchronous RL system for language model training that significantly improves throughput and training speed while maintaining convergence, by introducing multi-version streaming rollout to address long-tailed trajectories.
Contribution
The paper presents DORA, a novel asynchronous RL training system with multi-version streaming rollout, addressing long-tailed trajectories and achieving higher efficiency without losing convergence.
Findings
DORA achieves 2-3x higher throughput than state-of-the-art systems.
In industrial settings, DORA accelerates RL training by 2-4x over synchronous methods.
Open-source models trained with DORA match advanced LLMs on reasoning benchmarks.
Abstract
Reinforcement learning (RL) has become a critical paradigm for LLM post-training, yet the rollout phase -- accounting for 50--80% of total step time -- is bottlenecked by skewed generation: long-tailed trajectories indispensable for model performance block the entire training pipeline. Asynchronous training offers a natural remedy by overlapping generation with training, but introduces a fundamental tension between efficiency and algorithmic correctness. We identify three constraints in asynchronous training to preserve convergence: intra-trajectory policy consistency, data integrity, and bounded staleness. Existing approaches fail to intrinsically address the long-tailed trajectory problem, which is further exacerbated by the imbalance characteristic of Mix-of-Experts models, or deviate from the standard RL training formulation, thereby hindering model convergence. Therefore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
