SAPG: Split and Aggregate Policy Gradients
Jayesh Singla, Ananye Agarwal, Deepak Pathak

TL;DR
SAPG is a novel on-policy reinforcement learning algorithm that efficiently utilizes large-scale parallel environments by splitting and aggregating data, outperforming traditional methods like PPO in complex tasks.
Contribution
Introduces SAPG, a new on-policy RL method that leverages environment splitting and importance sampling to improve performance with large-scale parallel environments.
Findings
SAPG outperforms PPO in various challenging environments.
Performance of PPO saturates with increasing environment parallelization.
SAPG effectively scales with large numbers of parallel environments.
Abstract
Despite extreme sample inefficiency, on-policy reinforcement learning, aka policy gradients, has become a fundamental tool in decision-making problems. With the recent advances in GPU-driven simulation, the ability to collect large amounts of data for RL training has scaled exponentially. However, we show that current RL methods, e.g. PPO, fail to ingest the benefit of parallelized environments beyond a certain point and their performance saturates. To address this, we propose a new on-policy RL algorithm that can effectively leverage large-scale environments by splitting them into chunks and fusing them back together via importance sampling. Our algorithm, termed SAPG, shows significantly higher performance across a variety of challenging environments where vanilla PPO and other strong baselines fail to achieve high performance. Website at https://sapg-rl.github.io/
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic Policies and Impacts · Regional Development and Policy
MethodsEntropy Regularization · Proximal Policy Optimization
