Equivariant MuZero

Andreea Deac; Th\'eophane Weber; George Papamakarios

arXiv:2302.04798·cs.LG·February 10, 2023

Equivariant MuZero

Andreea Deac, Th\'eophane Weber, George Papamakarios

PDF

Open Access

TL;DR

This paper enhances the MuZero reinforcement learning algorithm by integrating environment symmetries into its model, improving data efficiency and generalisation, especially in procedurally-generated environments with unseen transformations.

Contribution

It introduces an equivariant architecture for MuZero that guarantees equivariance in its action-selection process, demonstrating improved generalisation in rotated maze environments.

Findings

01

Equivariant MuZero outperforms standard MuZero on unseen rotated mazes.

02

Performance gains are robust even with partial equivariance.

03

The approach improves data efficiency and generalisation in procedurally-generated environments.

Abstract

Deep reinforcement learning repeatedly succeeds in closed, well-defined domains such as games (Chess, Go, StarCraft). The next frontier is real-world scenarios, where setups are numerous and varied. For this, agents need to learn the underlying rules governing the environment, so as to robustly generalise to conditions that differ from those they were trained on. Model-based reinforcement learning algorithms, such as the highly successful MuZero, aim to accomplish this by learning a world model. However, leveraging a world model has not consistently shown greater generalisation capabilities compared to model-free alternatives. In this work, we propose improving the data efficiency and generalisation capabilities of MuZero by explicitly incorporating the symmetries of the environment in its world-model architecture. We prove that, so long as the neural networks used by MuZero are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Monte-Carlo Tree Search · Average Pooling · Batch Normalization · Convolution · Prioritized Experience Replay · Residual Block · MuZero