Learning to Play Atari in a World of Tokens

Pranav Agarwal; Sheldon Andrews; Samira Ebrahimi Kahou

arXiv:2406.01361·cs.LG·June 4, 2024

Learning to Play Atari in a World of Tokens

Pranav Agarwal, Sheldon Andrews, Samira Ebrahimi Kahou

PDF

Open Access 3 Reviews

TL;DR

This paper introduces DART, a discrete representation-based transformer method for sample-efficient model-based reinforcement learning in Atari games, improving performance and surpassing previous methods and human scores in many games.

Contribution

DART is the first to utilize discrete abstract representations in transformer-based reinforcement learning, enhancing modeling of complex, discrete world properties and improving sample efficiency.

Findings

01

DART achieves a median human-normalized score of 0.790 on Atari 100k.

02

DART outperforms previous state-of-the-art methods without look-ahead search.

03

DART surpasses human performance in 9 out of 26 Atari games.

Abstract

Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extended context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- The main strength of this paper is in the clarity of the idea and the presentation. The paper combines various existing approaches and combines them in a way that is not too complicated to understand or implement. - The novelty over the previous approach with discrete tokens - IRIS - lies in being able to learn a policy on latent states rather than on reconstructed observations as done in IRIS. The advantage of not using reconstructed observations is that it is a lot more computationally ef

Weaknesses

These are not weakness per se, but the reviewer thinks In these respects paper can be improved: - The approach is simple (which is good) and integrates components used by the model already exist in literature. Learning a world models on discrete tokens has been previously introduced in IRIS and using a ViT policy (which the authors claim to be their main novelty) head has been studied by Yoon et al 2023 (https://arxiv.org/abs/2302.04419). Usage of a memory token to feed past context has also be

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

- The paper is well-written and easy to follow. - The ablation study provided in the paper is well-designed and informative.

Weaknesses

- I need more clarification about the motivation that long-range dependencies impede Dreamers learning. Could the authors provide experiments comparing the world model accuracy among the proposed method and prior non-transformer-based methods on some long-range tasks? - I think the world model accuracy should be measured more in detail (i.e. future states predicting accuracy. reward predicting accuracy, etc.) to fully support the author's arguments on the RNN-based world model and Transformer-ba

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

* The paper proposes a transformer-based architecture for world modeling and policy learning and shows it's quite effective on Atari 100k. * The paper conducts extensive experiments on Atari 100k and provides many metrics to demonstrate the superiority of the proposed method. * The paper is easy to follow.

Weaknesses

* From the results in Table 1, DART is worse than DreamerV3 on multiple games. Also, DreamerV3's results are missing from the figures. * It would be better to evaluate DART's performance on multiple domains, such as robotic control, Crafter, DMLab, or even Minecraft, to show the discrete representations can generalize to different scenarios. * It seems like the core contribution of the paper is from the architecture side. However, there are also many prior works that leverage the transformer arc

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Games and Media

MethodsDifficulty-Aware Rejection Tuning