Equivariant MuZero
Andreea Deac, Th\'eophane Weber, George Papamakarios

TL;DR
This paper enhances the MuZero reinforcement learning algorithm by integrating environment symmetries into its model, improving data efficiency and generalisation, especially in procedurally-generated environments with unseen transformations.
Contribution
It introduces an equivariant architecture for MuZero that guarantees equivariance in its action-selection process, demonstrating improved generalisation in rotated maze environments.
Findings
Equivariant MuZero outperforms standard MuZero on unseen rotated mazes.
Performance gains are robust even with partial equivariance.
The approach improves data efficiency and generalisation in procedurally-generated environments.
Abstract
Deep reinforcement learning repeatedly succeeds in closed, well-defined domains such as games (Chess, Go, StarCraft). The next frontier is real-world scenarios, where setups are numerous and varied. For this, agents need to learn the underlying rules governing the environment, so as to robustly generalise to conditions that differ from those they were trained on. Model-based reinforcement learning algorithms, such as the highly successful MuZero, aim to accomplish this by learning a world model. However, leveraging a world model has not consistently shown greater generalisation capabilities compared to model-free alternatives. In this work, we propose improving the data efficiency and generalisation capabilities of MuZero by explicitly incorporating the symmetries of the environment in its world-model architecture. We prove that, so long as the neural networks used by MuZero are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Monte-Carlo Tree Search · Average Pooling · Batch Normalization · Convolution · Prioritized Experience Replay · Residual Block · MuZero
