Learning to Play Atari in a World of Tokens
Pranav Agarwal, Sheldon Andrews, Samira Ebrahimi Kahou

TL;DR
This paper introduces DART, a discrete representation-based transformer method for sample-efficient model-based reinforcement learning in Atari games, improving performance and surpassing previous methods and human scores in many games.
Contribution
DART is the first to utilize discrete abstract representations in transformer-based reinforcement learning, enhancing modeling of complex, discrete world properties and improving sample efficiency.
Findings
DART achieves a median human-normalized score of 0.790 on Atari 100k.
DART outperforms previous state-of-the-art methods without look-ahead search.
DART surpasses human performance in 9 out of 26 Atari games.
Abstract
Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extended context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial…
Peer Reviews
Decision·ICML 2024 Poster
- The main strength of this paper is in the clarity of the idea and the presentation. The paper combines various existing approaches and combines them in a way that is not too complicated to understand or implement. - The novelty over the previous approach with discrete tokens - IRIS - lies in being able to learn a policy on latent states rather than on reconstructed observations as done in IRIS. The advantage of not using reconstructed observations is that it is a lot more computationally ef
These are not weakness per se, but the reviewer thinks In these respects paper can be improved: - The approach is simple (which is good) and integrates components used by the model already exist in literature. Learning a world models on discrete tokens has been previously introduced in IRIS and using a ViT policy (which the authors claim to be their main novelty) head has been studied by Yoon et al 2023 (https://arxiv.org/abs/2302.04419). Usage of a memory token to feed past context has also be
- The paper is well-written and easy to follow. - The ablation study provided in the paper is well-designed and informative.
- I need more clarification about the motivation that long-range dependencies impede Dreamers learning. Could the authors provide experiments comparing the world model accuracy among the proposed method and prior non-transformer-based methods on some long-range tasks? - I think the world model accuracy should be measured more in detail (i.e. future states predicting accuracy. reward predicting accuracy, etc.) to fully support the author's arguments on the RNN-based world model and Transformer-ba
* The paper proposes a transformer-based architecture for world modeling and policy learning and shows it's quite effective on Atari 100k. * The paper conducts extensive experiments on Atari 100k and provides many metrics to demonstrate the superiority of the proposed method. * The paper is easy to follow.
* From the results in Table 1, DART is worse than DreamerV3 on multiple games. Also, DreamerV3's results are missing from the figures. * It would be better to evaluate DART's performance on multiple domains, such as robotic control, Crafter, DMLab, or even Minecraft, to show the discrete representations can generalize to different scenarios. * It seems like the core contribution of the paper is from the architecture side. However, there are also many prior works that leverage the transformer arc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Games and Media
MethodsDifficulty-Aware Rejection Tuning
