Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning

Jian Lu

arXiv:2511.18871·cs.LG·May 5, 2026

Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning

Jian Lu

PDF

1 Repo

TL;DR

This paper introduces a novel asynchronous RL training framework for LLMs that doubles throughput without sacrificing on-policy correctness, using a periodic synchronization approach and efficient architecture design.

Contribution

It proposes a periodically asynchronous, on-policy RL training framework that improves efficiency and throughput while maintaining algorithm compatibility and correctness.

Findings

01

Approximately 2x throughput improvement on NPU platforms

02

Up to 3x speedup on GPU platforms

03

Maintains on-policy correctness without off-policy bias

Abstract

Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention for LLM post-training, yet training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training are co-located on the same devices, and their synchronous execution prevents concurrent inference and training. In this work, we revisit the strategy of separating inference and training deployment, and propose a periodically asynchronous framework that transforms synchronous RL training into an asynchronous producer-consumer pipeline. By synchronising model weights at the beginning of each training iteration and generating all rollouts from the same policy, the proposed framework remains inherently on-policy -- without any modification to standard RL algorithms -- thereby avoiding the off-policy bias introduced by existing asynchronous approaches. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

janelu9/EasyLLM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.