# Deep Episodic Value Iteration for Model-based Meta-Reinforcement   Learning

**Authors:** Steven Stenberg Hansen

arXiv: 1705.03562 · 2017-05-11

## TL;DR

This paper introduces Deep Episodic Value Iteration (DEVI), a deep meta-reinforcement learning method that combines neural networks with a non-parametric model-based approach, enabling rapid adaptation to new reward and transition dynamics.

## Contribution

DEVI is a novel deep meta-RL algorithm that learns a similarity metric for model-based RL, trained end-to-end, and capable of one-shot transfer in high-dimensional state spaces.

## Key findings

- DEVI achieves rapid adaptation to changes in reward and transition structures.
- DEVI outperforms traditional model-free methods in transfer tasks.
- DEVI demonstrates effective one-shot transfer in high-dimensional environments.

## Abstract

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.03562/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1705.03562/full.md

---
Source: https://tomesphere.com/paper/1705.03562