Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models
John-Alexander M. Assael, Niklas Wahlstr\"om, Thomas B. Sch\"on, Marc, Peter Deisenroth

TL;DR
This paper presents a data-efficient, model-based reinforcement learning approach that learns closed-loop control policies directly from high-dimensional pixel observations using deep dynamical models, enabling scalable and autonomous control.
Contribution
It introduces a novel joint learning method for low-dimensional feature embedding and predictive modeling from pixel data, advancing end-to-end reinforcement learning from pixels to control torques.
Findings
Learns control policies quickly from pixel data.
Scales effectively to high-dimensional image observations.
Outperforms existing RL methods in data efficiency and scalability.
Abstract
Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ("torques") from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
