When do World Models Successfully Learn Dynamical Systems?

Edmund Ross; Claudia Drygala; Leonhard Schwarz; Samir Kaiser; Francesca di Mare; Tobias Breiten; Hanno Gottschalk

arXiv:2507.04898·math.NA·November 25, 2025

When do World Models Successfully Learn Dynamical Systems?

Edmund Ross, Claudia Drygala, Leonhard Schwarz, Samir Kaiser, Francesca di Mare, Tobias Breiten, Hanno Gottschalk

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the conditions under which world models with compact latent representations can effectively learn and simulate complex dynamical systems, supported by theoretical insights and empirical validation across diverse physical datasets.

Contribution

It introduces a theoretical framework explaining the success of tokenization in world models and demonstrates its effectiveness through models tested on various physical systems.

Findings

01

Tokenization effectively captures system dynamics.

02

Models successfully simulate heat, wave, and fluid flow datasets.

03

Theoretical conditions for successful reconstruction are characterized.

Abstract

In this work, we explore the use of compact latent representations with learned time dynamics ('World Models') to simulate physical systems. Drawing on concepts from control theory, we propose a theoretical framework that explains why projecting time slices into a low-dimensional space and then concatenating to form a history ('Tokenization') is so effective at learning physics datasets, and characterise when exactly the underlying dynamics admit a reconstruction mapping from the history of previous tokenized frames to the next. To validate these claims, we develop a sequence of models with increasing complexity, starting with least-squares regression and progressing through simple linear layers, shallow adversarial learners, and ultimately full-scale generative adversarial networks (GANs). We evaluate these models on a variety of datasets, including modified forms of the heat and wave…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

The paper raises an interesting conceptual question connecting world models and operator learning. The experiments cover several standard PDE benchmarks and attempt to bridge control theory and data-driven modeling.

Weaknesses

1. The paper does not answer “when world models succeed” in any rigorous sense. The so-called theoretical results restate standard observability conditions and a generic PAC-learning existence theorem, without offering new criteria or quantitative conditions. The main theorems are textbook results in nonlinear systems (Kalman, Hermann–Krener) and elementary PAC learnability statements. 2. It is not clear why the models used are “world models.” They are standard autoregressive video predictors

Reviewer 02Rating 6Confidence 4

Strengths

1. To my knowledge, the theoretical PAC formulation of these latent, autoegressive models is new contribution to data-driven dynamical systems. The authors proceed very systematically and the rigor and clarity of their theoretical framework and experimental results is well-received. The empirical predictions the authors make about when models will be easily learned or not are strong and elegant. 2. Benchmarking against other methods in terms of performance and complexity is very thorough.

Weaknesses

1. The principal weakness of this paper is its rhetorical presentation. In particular, introduction makes it very hard to understand what problem the authors want to solve and how their approach differs from other methods. After reading, it becomes clear that they seek to address the lack of applications of auto-regressive, latent-space models (i.e. world models) to physical systems rather than video generation, particularly with rigorous guarantees of performance based on observability. Further

Reviewer 03Rating 4Confidence 2

Strengths

- **Originality**: The paper's main strength is in merging concepts from diverse fields -- linking concepts from control theory, "observability" to generative "world models" using , and further framing the latter as operator learning. In doing so it provides a framework that allows deeper insight on the expected performance of world models. - **Quality**: The work is of good quality, providing rigorous theoretical grounding alongside empirical validation. The theory is formally presented with c

Weaknesses

- **Assumptions / scope of theory**: The main analysis relies on "global observability" and the existence of a continuous inverse map $G$. A strong condition; which as indicated by the authors in nonlinear PDEs it is rarely checkable, following this the PAC theorems are qualitative. - **Circular validation**: Following the previous comment, the authors specifically state that observability for the full turbulent flow is unknown and thus rely on the model's success as "experimental evidence". T

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Neural Networks and Applications