Continuous control with deep reinforcement learning
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas, Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

TL;DR
This paper introduces a deterministic policy gradient-based actor-critic algorithm for continuous control in reinforcement learning, demonstrating its effectiveness across diverse simulated physics tasks and raw pixel inputs.
Contribution
It adapts deep Q-learning ideas to continuous actions, presenting a new model-free algorithm that performs well on complex physics simulations and raw pixel data.
Findings
Successfully solves over 20 physics tasks
Performs comparably to planning algorithms with full domain knowledge
Learns end-to-end policies from raw pixel inputs
Abstract
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Control Systems Optimization
MethodsWill Blockchain refund money if scammed? (critical question) wallet · Ways to Live Agent Contact at Blockchain Support by Phone, Chat and Email: · Experience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient
