GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control
Haofeng Xu, Junwei Su, Yukun Tian, Lansong Diao, Zhengping Qian, Chuan Wu

TL;DR
This paper introduces GAC, a method to stabilize asynchronous reinforcement learning training of large language models by controlling gradient alignment, addressing instability caused by stale, correlated gradients.
Contribution
The paper identifies the cause of instability in asynchronous RL as high cosine similarity between consecutive gradients and proposes GAC to mitigate this issue with theoretical guarantees.
Findings
GAC stabilizes asynchronous RL training at high staleness levels.
Asynchronous training exhibits persistently high cosine similarity between gradients.
GAC matches synchronized training performance despite high asynchrony.
Abstract
Asynchronous execution is essential for scaling reinforcement learning (RL) to modern large model workloads, including large language models and AI agents, but it can fundamentally alter RL optimization behavior. While prior work on asynchronous RL focuses on training throughput and distributional correction, we show that naively applying asynchrony to policy-gradient updates can induce qualitatively different training dynamics and lead to severe training instability. Through systematic empirical and theoretical analysis, we identify a key signature of this instability: asynchronous training exhibits persistently high cosine similarity between consecutive policy gradients, in contrast to the near-orthogonal updates observed under synchronized training. This stale-aligned gradient effect amplifies correlated updates and increases the risk of overshooting and divergence. Motivated by this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Memory and Neural Computing
