Learning Continuous Control Policies by Stochastic Value Gradients

Nicolas Heess; Greg Wayne; David Silver; Timothy Lillicrap; Yuval; Tassa; and Tom Erez

arXiv:1510.09142·cs.LG·November 2, 2015·112 cites

Learning Continuous Control Policies by Stochastic Value Gradients

Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval, Tassa, and Tom Erez

PDF

Open Access 3 Repos

TL;DR

This paper introduces a unified framework for learning continuous control policies using stochastic value gradients, enabling flexible model-based and model-free methods that improve learning efficiency and robustness in control tasks.

Contribution

It develops a spectrum of policy gradient algorithms that incorporate stochasticity and learned models, reducing compounded errors and enabling simultaneous learning of models, value functions, and policies.

Findings

01

SVG(1) effectively learns models, value functions, and policies simultaneously.

02

Algorithms perform well on toy and physics-based control problems.

03

Framework unifies model-free and model-based approaches in continuous control.

Abstract

We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment in- stead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Fault Detection and Control Systems