Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling

Jongchan Park

arXiv:2605.21557·stat.ML·May 22, 2026

Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling

Jongchan Park

PDF

TL;DR

This paper introduces Adaptive Batch Scaling (ABS), a method that dynamically adjusts batch sizes in reinforcement learning based on policy stability, enabling large-batch training and improved performance.

Contribution

The paper proposes a novel adaptive batch scaling technique guided by Behavioral Divergence, allowing large-batch RL training by adjusting batch size according to policy non-stationarity.

Findings

01

ABS enables large-batch RL training with improved performance.

02

Larger networks and batch sizes together yield better results.

03

Behavioral Divergence effectively measures policy non-stationarity.

Abstract

Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the inherent non-stationarity of the data distribution. We challenge this view by observing that non-stationarity is not a fixed property of RL, but evolves throughout training: early stages exhibit rapid behavioral shifts that demand small batches for plasticity, whereas late stages approach a quasi-stationary regime where large batches enable precise convergence. Motivated by this observation, we propose Adaptive Batch Scaling (ABS), that dynamically adjusts the effective batch size according to the stability of the learning policy. Central to ABS is Behavioral Divergence, a novel metric that quantifies policy non-stationarity by measuring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.