The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks

Walter Mayor; Johan Obando-Ceron; Aaron Courville; Pablo Samuel Castro

arXiv:2506.03404·cs.LG·June 5, 2025

The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks

Walter Mayor, Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

PDF

Open Access

TL;DR

This paper empirically analyzes how parallel data collection strategies in reinforcement learning, especially in PPO, affect performance, network stability, and hyper-parameter sensitivity, emphasizing the importance of data collection choices.

Contribution

It provides an empirical study on the effects of parallel environment scaling and rollout length in PPO, revealing optimal data collection strategies for improved performance.

Findings

01

Larger datasets improve final performance

02

Scaling parallel environments is more effective than longer rollouts

03

Data collection strategies critically influence agent performance

Abstract

The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. We examine its impact on network architectures, as well as the hyper-parameter sensitivity when scaling data. Our analyses indicate that larger dataset sizes can increase final performance across a variety of settings, and that scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Software-Defined Networks and 5G · Stochastic Gradient Optimization Techniques