High-Dimensional Continuous Control Using Generalized Advantage   Estimation

John Schulman; Philipp Moritz; Sergey Levine; Michael Jordan; Pieter; Abbeel

arXiv:1506.02438·cs.LG·October 23, 2018·ICLR·1.7k cites

High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter, Abbeel

PDF

Open Access 5 Repos

TL;DR

This paper introduces a policy gradient method with variance reduction and trust region optimization, enabling effective high-dimensional continuous control in complex 3D robotics tasks with neural network policies.

Contribution

It proposes a novel advantage estimation technique combined with trust region methods for stable, efficient learning in high-dimensional continuous control tasks.

Findings

01

Achieved successful locomotion and standing policies for simulated robots.

02

Reduced sample complexity to 1-2 weeks of real time.

03

Demonstrated stability and improvement in complex 3D environments.

Abstract

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(lambda). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks. Our approach yields strong empirical results on highly challenging 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Real-time simulation and control systems