One-shot World Models Using a Transformer Trained on a Synthetic Prior

Fabio Ferreira; Moreno Schlageter; Raghu Rajan; Andre; Biedenkapp; Frank Hutter

arXiv:2409.14084·cs.LG·October 28, 2024

One-shot World Models Using a Transformer Trained on a Synthetic Prior

Fabio Ferreira, Moreno Schlageter, Raghu Rajan, Andre, Biedenkapp, Frank Hutter

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces One-Shot World Model (OSWM), a transformer-based approach trained on synthetic data that quickly adapts to simple environments for policy training, marking progress towards learning world models solely from synthetic sources.

Contribution

The paper presents a novel transformer world model trained in a one-shot manner using synthetic data, enabling rapid adaptation to new environments for policy learning.

Findings

01

OSWM adapts quickly to simple environments with 1k transition steps.

02

It successfully trains policies for grid world and CartPole environments.

03

Transfer to complex environments remains a challenge.

Abstract

A World Model is a compressed spatial and temporal representation of a real world environment that allows one to train an agent or execute planning methods. However, world models are typically trained on observations from the real world environment, and they usually do not enable learning policies for other real environments. We propose One-Shot World Model (OSWM), a transformer world model that is learned in an in-context learning fashion from purely synthetic data sampled from a prior distribution. Our prior is composed of multiple randomly initialized neural networks, where each network models the dynamics of each state and reward dimension of a desired target environment. We adopt the supervised learning procedure of Prior-Fitted Networks by masking next-state and reward at random context positions and query OSWM to make probabilistic predictions based on the remaining transition…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

- Training world models entirely on synthetic data generated from randomly initialized neural networks is a novel and intriguing idea. - Promising results on simple environments: The authors demonstrate successful agent training on simple environments purely from synthetic priors. It suggests the potential of this approach for rapid adaptation to new tasks. Leveraging in-context learning allows for quick adaptation to unseen environments without extensive retraining. - The paper provides a deta

Weaknesses

- Limited applicability to complex environments: The current model struggles with harder environments highlighting the need for further development.

Reviewer 02Rating 5Confidence 4

Strengths

- The problem studied here is novel and quite interesting. - The results, while not very exceptional, are still promising and show the potential of synthetic data for real problems.

Weaknesses

- I wonder if it is right to call the model a one-shot model. During inference the model uses 1000 transitions as in-context examples which may comprised different number of episodes depending on the environment. In some cases, the environment maybe non-episodic. In general 1-shot refers to using 1 in-context examples. I believe calling 1000 transitions as one in-context example can be a bit misleading. Maybe a more general term would be few-shot or in-context world models? - A more thorough ana

Reviewer 03Rating 3Confidence 4

Strengths

1. I find the proposed approach simple and novel

Weaknesses

1. In my opinion this paper falls more on the empirical contributions spectrum. Through that lens I find the presented results too limited and not sufficiently impactful. At this stage, the work shows signs of life but needs more convincing results on more difficult tasks in order to have the impact on the scientific community. For example, there has been world model work which assumes certain properties in the deepmind control suite [Hao et al, 2021](https://arxiv.org/abs/2112.02817) and gets s

Reviewer 04Rating 3Confidence 4

Strengths

- The idea is sound. - I also like that the authors kept the randomly initialized NN simple with recurrent units.

Weaknesses

Overall, the paper feels incomplete, with multiple concerns as stated below. - Writing: Section 3.2 was difficult to go through. Besides, I would encourage the authors to improve the captions on the Figures. - Techniques: The momentum prior feels engineered to the task at hand. Ideally, I'd like to see how this prior (or an update of this) can help in a wide range of tasks. - Experiments: The authors test their work on a small subset of the environments. Given the work introduces synthetic sam

Code & Models

Repositories

automl/oswm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Neural Networks and Applications