Improving Model-Based Reinforcement Learning with Internal State   Representations through Self-Supervision

Julien Scholz; Cornelius Weber; Muhammad Burhan Hafez; Stefan; Wermter

arXiv:2102.05599·cs.LG·January 19, 2022

Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

Julien Scholz, Cornelius Weber, Muhammad Burhan Hafez, Stefan, Wermter

PDF

2 Repos

TL;DR

This paper enhances model-based reinforcement learning by integrating self-supervised internal state representation constraints, leading to improved performance and enabling pretraining in environments like OpenAI Gym.

Contribution

It introduces a novel method of binding internal state representations to environment states using unsupervised reconstruction and consistency losses, improving stability and performance.

Findings

01

Significant performance improvements in OpenAI Gym environments.

02

Enables effective self-supervised pretraining of MuZero.

03

Stabilizes learning through additional unsupervised constraints.

Abstract

Using a model of the environment, reinforcement learning agents can plan their future moves and achieve superhuman performance in board games like Chess, Shogi, and Go, while remaining relatively sample-efficient. As demonstrated by the MuZero Algorithm, the environment model can even be learned dynamically, generalizing the agent to many more tasks while at the same time achieving state-of-the-art performance. Notably, MuZero uses internal state representations derived from real environment states for its predictions. In this paper, we bind the model's predicted internal state representation to the environment state via two additional terms: a reconstruction model loss and a simpler consistency loss, both of which work independently and unsupervised, acting as constraints to stabilize the learning process. Our experiments show that this new integration of reconstruction model loss and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBatch Normalization · Residual Connection · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Block · Monte-Carlo Tree Search · Average Pooling · Prioritized Experience Replay · MuZero