TL;DR
This paper enhances model-based reinforcement learning by integrating self-supervised internal state representation constraints, leading to improved performance and enabling pretraining in environments like OpenAI Gym.
Contribution
It introduces a novel method of binding internal state representations to environment states using unsupervised reconstruction and consistency losses, improving stability and performance.
Findings
Significant performance improvements in OpenAI Gym environments.
Enables effective self-supervised pretraining of MuZero.
Stabilizes learning through additional unsupervised constraints.
Abstract
Using a model of the environment, reinforcement learning agents can plan their future moves and achieve superhuman performance in board games like Chess, Shogi, and Go, while remaining relatively sample-efficient. As demonstrated by the MuZero Algorithm, the environment model can even be learned dynamically, generalizing the agent to many more tasks while at the same time achieving state-of-the-art performance. Notably, MuZero uses internal state representations derived from real environment states for its predictions. In this paper, we bind the model's predicted internal state representation to the environment state via two additional terms: a reconstruction model loss and a simpler consistency loss, both of which work independently and unsupervised, acting as constraints to stabilize the learning process. Our experiments show that this new integration of reconstruction model loss and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization · Residual Connection · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Block · Monte-Carlo Tree Search · Average Pooling · Prioritized Experience Replay · MuZero
