SAPG: Split and Aggregate Policy Gradients

Jayesh Singla; Ananye Agarwal; Deepak Pathak

arXiv:2407.20230·cs.LG·July 30, 2024

SAPG: Split and Aggregate Policy Gradients

Jayesh Singla, Ananye Agarwal, Deepak Pathak

PDF

Open Access

TL;DR

SAPG is a novel on-policy reinforcement learning algorithm that efficiently utilizes large-scale parallel environments by splitting and aggregating data, outperforming traditional methods like PPO in complex tasks.

Contribution

Introduces SAPG, a new on-policy RL method that leverages environment splitting and importance sampling to improve performance with large-scale parallel environments.

Findings

01

SAPG outperforms PPO in various challenging environments.

02

Performance of PPO saturates with increasing environment parallelization.

03

SAPG effectively scales with large numbers of parallel environments.

Abstract

Despite extreme sample inefficiency, on-policy reinforcement learning, aka policy gradients, has become a fundamental tool in decision-making problems. With the recent advances in GPU-driven simulation, the ability to collect large amounts of data for RL training has scaled exponentially. However, we show that current RL methods, e.g. PPO, fail to ingest the benefit of parallelized environments beyond a certain point and their performance saturates. To address this, we propose a new on-policy RL algorithm that can effectively leverage large-scale environments by splitting them into chunks and fusing them back together via importance sampling. Our algorithm, termed SAPG, shows significantly higher performance across a variety of challenging environments where vanilla PPO and other strong baselines fail to achieve high performance. Website at https://sapg-rl.github.io/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic Policies and Impacts · Regional Development and Policy

MethodsEntropy Regularization · Proximal Policy Optimization