AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu

TL;DR
AReaL introduces a fully asynchronous reinforcement learning system for large language models, significantly improving training efficiency and GPU utilization while maintaining or enhancing reasoning performance.
Contribution
It presents a novel asynchronous RL system that decouples generation from training, with system-level optimizations and a staleness-aware PPO variant for stable, efficient training.
Findings
Achieves up to 2.77× training speedup over synchronous systems.
Maintains or improves reasoning performance on benchmarks.
Enhances GPU utilization through system optimizations.
Abstract
Reinforcement learning (RL) has become a dominant paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training batch are generated by the same model. This approach stabilizes RL training but suffers from severe system-level inefficiency: generation must wait until the longest output in the batch is completed before model updates, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗inclusionAI/AReaL-boba-2-8B-Openmodel· 12 dl· ♡ 1912 dl♡ 19
- 🤗inclusionAI/AReaL-boba-2-14B-Openmodel· 15 dl· ♡ 1915 dl♡ 19
- 🤗inclusionAI/AReaL-boba-2-14Bmodel· 5 dl· ♡ 215 dl♡ 21
- 🤗inclusionAI/AReaL-boba-2-8Bmodel· 10 dl· ♡ 2510 dl♡ 25
- 🤗inclusionAI/AReaL-boba-2-32Bmodel· 10 dl· ♡ 1910 dl♡ 19
- 🤗inclusionAI/AReaL-SEA-235B-A22Bmodel· 32 dl· ♡ 532 dl♡ 5
- 🤗servantofares/AReaL-SEA-235B-A22Bmodel· 5 dl5 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Fuzzy Logic and Control Systems · Speech and dialogue systems
MethodsEntropy Regularization · Proximal Policy Optimization
