One-shot World Models Using a Transformer Trained on a Synthetic Prior
Fabio Ferreira, Moreno Schlageter, Raghu Rajan, Andre, Biedenkapp, Frank Hutter

TL;DR
This paper introduces One-Shot World Model (OSWM), a transformer-based approach trained on synthetic data that quickly adapts to simple environments for policy training, marking progress towards learning world models solely from synthetic sources.
Contribution
The paper presents a novel transformer world model trained in a one-shot manner using synthetic data, enabling rapid adaptation to new environments for policy learning.
Findings
OSWM adapts quickly to simple environments with 1k transition steps.
It successfully trains policies for grid world and CartPole environments.
Transfer to complex environments remains a challenge.
Abstract
A World Model is a compressed spatial and temporal representation of a real world environment that allows one to train an agent or execute planning methods. However, world models are typically trained on observations from the real world environment, and they usually do not enable learning policies for other real environments. We propose One-Shot World Model (OSWM), a transformer world model that is learned in an in-context learning fashion from purely synthetic data sampled from a prior distribution. Our prior is composed of multiple randomly initialized neural networks, where each network models the dynamics of each state and reward dimension of a desired target environment. We adopt the supervised learning procedure of Prior-Fitted Networks by masking next-state and reward at random context positions and query OSWM to make probabilistic predictions based on the remaining transition…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- Training world models entirely on synthetic data generated from randomly initialized neural networks is a novel and intriguing idea. - Promising results on simple environments: The authors demonstrate successful agent training on simple environments purely from synthetic priors. It suggests the potential of this approach for rapid adaptation to new tasks. Leveraging in-context learning allows for quick adaptation to unseen environments without extensive retraining. - The paper provides a deta
- Limited applicability to complex environments: The current model struggles with harder environments highlighting the need for further development.
- The problem studied here is novel and quite interesting. - The results, while not very exceptional, are still promising and show the potential of synthetic data for real problems.
- I wonder if it is right to call the model a one-shot model. During inference the model uses 1000 transitions as in-context examples which may comprised different number of episodes depending on the environment. In some cases, the environment maybe non-episodic. In general 1-shot refers to using 1 in-context examples. I believe calling 1000 transitions as one in-context example can be a bit misleading. Maybe a more general term would be few-shot or in-context world models? - A more thorough ana
1. I find the proposed approach simple and novel
1. In my opinion this paper falls more on the empirical contributions spectrum. Through that lens I find the presented results too limited and not sufficiently impactful. At this stage, the work shows signs of life but needs more convincing results on more difficult tasks in order to have the impact on the scientific community. For example, there has been world model work which assumes certain properties in the deepmind control suite [Hao et al, 2021](https://arxiv.org/abs/2112.02817) and gets s
- The idea is sound. - I also like that the authors kept the randomly initialized NN simple with recurrent units.
Overall, the paper feels incomplete, with multiple concerns as stated below. - Writing: Section 3.2 was difficult to go through. Besides, I would encourage the authors to improve the captions on the Figures. - Techniques: The momentum prior feels engineered to the task at hand. Ideally, I'd like to see how this prior (or an update of this) can help in a wide range of tasks. - Experiments: The authors test their work on a small subset of the environments. Given the work introduces synthetic sam
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Neural Networks and Applications
