AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu; Jiaxuan Gao; Xujie Shen; Chen Zhu; Zhiyu Mei; Chuyi He; Shusheng Xu; Guo Wei; Jun Mei; Jiashu Wang; Tongkai Yang; Binhang Yuan; Yi Wu

arXiv:2505.24298·cs.LG·March 3, 2026

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu

PDF

Open Access 1 Repo 7 Models 1 Datasets

TL;DR

AReaL introduces a fully asynchronous reinforcement learning system for large language models, significantly improving training efficiency and GPU utilization while maintaining or enhancing reasoning performance.

Contribution

It presents a novel asynchronous RL system that decouples generation from training, with system-level optimizations and a staleness-aware PPO variant for stable, efficient training.

Findings

01

Achieves up to 2.77× training speedup over synchronous systems.

02

Maintains or improves reasoning performance on benchmarks.

03

Enhances GPU utilization through system optimizations.

Abstract

Reinforcement learning (RL) has become a dominant paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training batch are generated by the same model. This approach stabilizes RL training but suffers from severe system-level inefficiency: generation must wait until the longest output in the batch is completed before model updates, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

inclusionai/areal
pytorchOfficial

Models

Datasets

inclusionAI/AReaL-tau2-data
dataset· 297 dl
297 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Fuzzy Logic and Control Systems · Speech and dialogue systems

MethodsEntropy Regularization · Proximal Policy Optimization