Memory-based control with recurrent neural networks
Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, David Silver

TL;DR
This paper extends model-free reinforcement learning algorithms with recurrent neural networks to effectively handle partially observed control problems, demonstrating success across various memory-dependent tasks including pixel-based observations.
Contribution
It introduces a method combining deterministic and stochastic policy gradients with RNNs trained via backpropagation through time for partially observed control tasks.
Findings
Recurrent policies solve diverse memory tasks including noisy sensor integration.
The approach handles high-dimensional pixel observations directly.
Recurrent deterministic and stochastic policies perform similarly on complex tasks.
Abstract
Partially observed control problems are a challenging aspect of reinforcement learning. We extend two related, model-free algorithms for continuous control -- deterministic policy gradient and stochastic value gradient -- to solve partially observed domains using recurrent neural networks trained with backpropagation through time. We demonstrate that this approach, coupled with long-short term memory is able to solve a variety of physical control problems exhibiting an assortment of memory requirements. These include the short-term integration of information from noisy sensors and the identification of system parameters, as well as long-term memory problems that require preserving information over many time steps. We also demonstrate success on a combined exploration and memory problem in the form of a simplified version of the well-known Morris water maze task. Finally, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Adaptive Dynamic Programming Control
