Continuous control with deep reinforcement learning

Timothy P. Lillicrap; Jonathan J. Hunt; Alexander Pritzel; Nicolas; Heess; Tom Erez; Yuval Tassa; David Silver; Daan Wierstra

arXiv:1509.02971·cs.LG·July 8, 2019·ICLR·5.4k cites

Continuous control with deep reinforcement learning

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas, Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

PDF

Open Access 5 Repos

TL;DR

This paper introduces a deterministic policy gradient-based actor-critic algorithm for continuous control in reinforcement learning, demonstrating its effectiveness across diverse simulated physics tasks and raw pixel inputs.

Contribution

It adapts deep Q-learning ideas to continuous actions, presenting a new model-free algorithm that performs well on complex physics simulations and raw pixel data.

Findings

01

Successfully solves over 20 physics tasks

02

Performs comparably to planning algorithms with full domain knowledge

03

Learns end-to-end policies from raw pixel inputs

Abstract

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Control Systems Optimization

MethodsWill Blockchain refund money if scammed? (critical question) wallet · Ways to Live Agent Contact at Blockchain Support by Phone, Chat and Email: · Experience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient