Proximal Policy Optimization Algorithms

John Schulman; Filip Wolski; Prafulla Dhariwal; Alec Radford; Oleg; Klimov

arXiv:1707.06347·cs.LG·August 29, 2017

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg, Klimov

PDF

5 Repos 10 Models 1 Datasets 1 Video

TL;DR

Proximal Policy Optimization (PPO) introduces a simple, effective policy gradient method for reinforcement learning that improves sample efficiency and performance across various benchmark tasks.

Contribution

PPO presents a new policy gradient algorithm that allows multiple updates per data sample, combining benefits of trust region methods with simplicity and better empirical sample complexity.

Findings

01

PPO outperforms other policy gradient methods on benchmark tasks.

02

PPO achieves a good balance between sample efficiency and computational simplicity.

03

Empirical results show PPO's effectiveness in robotic locomotion and Atari games.

Abstract

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

EpicPinkPenguin/procgen
dataset· 327k dl
327k dl

Videos

An introduction to Policy Gradient methods - Deep Reinforcement Learning· youtube

Taxonomy

Methods07 Easy Ways to Speak With a Live Agent at Priceline Airlines: A Help Guide · Entropy Regularization · Proximal Policy Optimization