Adaptive Policy Synchronization for Scalable Reinforcement Learning

Rodney Lafuente-Mercado

arXiv:2507.10990·cs.LG·October 21, 2025

Adaptive Policy Synchronization for Scalable Reinforcement Learning

Rodney Lafuente-Mercado

PDF

Open Access 1 Repo

TL;DR

This paper presents ClusterEnv, a scalable distributed environment interface for reinforcement learning, and introduces Adaptive Policy Synchronization (APS) to reduce communication overhead while maintaining performance.

Contribution

It introduces ClusterEnv, a flexible distributed environment interface, and proposes APS, a novel synchronization method that balances staleness and communication efficiency in RL training.

Findings

01

APS maintains performance with reduced synchronization overhead.

02

ClusterEnv supports both on- and off-policy RL methods.

03

The approach integrates easily into existing RL training pipelines.

Abstract

Scaling reinforcement learning (RL) often requires running environments across many machines, but most frameworks tie simulation, training, and infrastructure into rigid systems. We introduce ClusterEnv, a lightweight interface for distributed environment execution that preserves the familiar Gymnasium API. ClusterEnv uses the DETACH pattern, which moves environment reset() and step() operations to remote workers while keeping learning centralized. To reduce policy staleness without heavy communication, we propose Adaptive Policy Synchronization (APS), where workers request updates only when divergence from the central learner grows too large. ClusterEnv supports both on- and off-policy methods, integrates into existing training code with minimal changes, and runs efficiently on clusters. Experiments on discrete control tasks show that APS maintains performance while cutting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rodlaf/clusterenv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Security and Resilience · Reinforcement Learning in Robotics · Traffic control and management