Imagined Value Gradients: Model-Based Policy Optimization with   Transferable Latent Dynamics Models

Arunkumar Byravan; Jost Tobias Springenberg; Abbas Abdolmaleki; Roland; Hafner; Michael Neunert; Thomas Lampe; Noah Siegel; Nicolas Heess; Martin; Riedmiller

arXiv:1910.04142·cs.RO·October 10, 2019·6 cites

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Arunkumar Byravan, Jost Tobias Springenberg, Abbas Abdolmaleki, Roland, Hafner, Michael Neunert, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin, Riedmiller

PDF

Open Access

TL;DR

This paper introduces a model-based RL method that uses transferable latent dynamics models to enable rapid transfer and adaptation to new tasks, especially in robot manipulation, by imagining future trajectories for policy optimization.

Contribution

It presents a novel algorithm that learns a predictive, action-conditional model from vision and proprioception, facilitating transfer to new tasks with different rewards and distractors.

Findings

01

Significant speed-up in learning in transfer scenarios

02

Robust policy optimization with approximate models

03

Effective in robot manipulation tasks

Abstract

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings