PPO Dash: Improving Generalization in Deep Reinforcement Learning
Joe Booth

TL;DR
This paper investigates methods to enhance the generalization of deep reinforcement learning, specifically improving PPO performance on the Obstacle Tower Challenge through empirical evaluation of various techniques.
Contribution
It introduces and empirically assesses improvements to PPO, achieving state-of-the-art results on a challenging randomized environment benchmark.
Findings
Enhanced PPO methods improve generalization in deep RL.
State-of-the-art performance achieved on the Obstacle Tower Challenge.
Best practices identified for training robust RL agents.
Abstract
Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning
MethodsEntropy Regularization · Proximal Policy Optimization
