Improving Generalization on the ProcGen Benchmark with Simple Architectural Changes and Scale
Andrew Jesson, Yiding Jiang

TL;DR
This paper shows that simple architectural modifications and scaling in reinforcement learning significantly enhance generalization on the ProcGen benchmark, achieving state-of-the-art results with minimal hyperparameter tuning.
Contribution
The study introduces straightforward architectural changes and scaling strategies that, when combined with recent RL advances, substantially improve generalization performance on ProcGen.
Findings
37.9% reduction in optimality gap
Matches or exceeds state-of-the-art methods
Changes are orthogonal and complementary to existing approaches
Abstract
We demonstrate that recent advances in reinforcement learning (RL) combined with simple architectural changes significantly improves generalization on the ProcGen benchmark. These changes are frame stacking, replacing 2D convolutional layers with 3D convolutional layers, and scaling up the number of convolutional kernels per layer. Experimental results using a single set of hyperparameters across all environments show a 37.9\% reduction in the optimality gap compared to the baseline (from 0.58 to 0.36). This performance matches or exceeds current state-of-the-art methods. The proposed changes are largely orthogonal and therefore complementary to the existing approaches for improving generalization in RL, and our results suggest that further exploration in this direction could yield substantial improvements in addressing generalization challenges in deep reinforcement learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
MethodsSparse Evolutionary Training
