Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models
Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, Ville, Kyrki, Danica Kragic, M{\aa}rten Bj\"orkman

TL;DR
This paper introduces GenRL, a data-efficient framework combining reinforcement learning and generative models for safe, effective policy training in robotics, with predictive evaluation measures and superior performance over existing methods.
Contribution
The paper proposes GenRL, a novel approach that integrates latent variable generative models with RL to improve data efficiency and safety in robotic policy learning.
Findings
GenRL outperforms two state-of-the-art RL methods in robotics tasks.
Generative models' characteristics significantly influence policy performance.
Evaluation measures can predict RL policy success before physical training.
Abstract
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
