A model-based approach to meta-Reinforcement Learning: Transformers and tree search
Brieuc Pinon, Jean-Charles Delvenne, Rapha\"el Jungers

TL;DR
This paper introduces a model-based meta-RL approach using Transformers and tree search, demonstrating superior exploration and exploitation capabilities in the Alchemy benchmark compared to model-free methods.
Contribution
It develops a novel model-based meta-RL algorithm with a Transformer encoder for environment dynamics and online planning, advancing exploration strategies.
Findings
Outperforms previous model-free RL methods on Alchemy benchmark
Shows Transformer effectively models complex latent space dynamics
Highlights the importance of model-based planning in meta-RL
Abstract
Meta-learning is a line of research that develops the ability to leverage past experiences to efficiently solve new learning problems. Meta-Reinforcement Learning (meta-RL) methods demonstrate a capability to learn behaviors that efficiently acquire and exploit information in several meta-RL problems. In this context, the Alchemy benchmark has been proposed by Wang et al. [2021]. Alchemy features a rich structured latent space that is challenging for state-of-the-art model-free RL methods. These methods fail to learn to properly explore then exploit. We develop a model-based algorithm. We train a model whose principal block is a Transformer Encoder to fit the symbolic Alchemy environment dynamics. Then we define an online planner with the learned model using a tree search method. This algorithm significantly outperforms previously applied model-free RL methods on the symbolic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Dropout · Label Smoothing · Softmax
