PPO Dash: Improving Generalization in Deep Reinforcement Learning

Joe Booth

arXiv:1907.06704·cs.LG·July 29, 2019·1 cites

PPO Dash: Improving Generalization in Deep Reinforcement Learning

Joe Booth

PDF

Open Access 1 Repo

TL;DR

This paper investigates methods to enhance the generalization of deep reinforcement learning, specifically improving PPO performance on the Obstacle Tower Challenge through empirical evaluation of various techniques.

Contribution

It introduces and empirically assesses improvements to PPO, achieving state-of-the-art results on a challenging randomized environment benchmark.

Findings

01

Enhanced PPO methods improve generalization in deep RL.

02

State-of-the-art performance achieved on the Obstacle Tower Challenge.

03

Best practices identified for training robust RL agents.

Abstract

Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sohojoe/ppo-dash
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning

MethodsEntropy Regularization · Proximal Policy Optimization