DeepMDP: Learning Continuous Latent Space Models for Representation   Learning

Carles Gelada; Saurabh Kumar; Jacob Buckman; Ofir Nachum; Marc G.; Bellemare

arXiv:1906.02736·cs.LG·June 7, 2019·67 cites

DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G., Bellemare

PDF

Open Access

TL;DR

This paper introduces DeepMDP, a latent space model for reinforcement learning that predicts rewards and next states, improving representation quality and RL performance, especially in high-dimensional observation environments.

Contribution

The paper proposes DeepMDP, a novel latent space model trained with reward and transition prediction losses, ensuring environment modeling and representation quality, with demonstrated benefits in RL tasks.

Findings

01

DeepMDP recovers latent structure in synthetic environments.

02

Learning DeepMDP as an auxiliary task improves Atari RL performance.

03

Theoretical guarantees connect DeepMDP training objectives to environment modeling.

Abstract

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a DeepMDP, a parameterized latent space model that is trained via the minimization of two tractable losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the latent space as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. We connect these results to prior work in the bisimulation literature, and explore the use of a variety of metrics. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Adversarial Robustness in Machine Learning