Learning Local Causal World Models with State Space Models and Attention

Francesco Petri; Luigi Asprino; Aldo Gangemi

arXiv:2505.02074·cs.LG·May 6, 2025

Learning Local Causal World Models with State Space Models and Attention

Francesco Petri, Luigi Asprino, Aldo Gangemi

PDF

Open Access

TL;DR

This paper demonstrates that State Space Models can effectively learn causal world representations and outperform Transformers in modeling environment dynamics, advancing causal understanding in neural world models.

Contribution

It empirically shows that SSMs can learn causal representations with comparable or better performance than Transformers, highlighting their potential for causal world modeling.

Findings

01

SSMs can model environment dynamics effectively.

02

SSMs can learn causal representations.

03

SSMs outperform Transformers in simple environment modeling.

Abstract

World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Despite their impressive performance, many solutions fail to learn a causal representation of the environment they are trying to model, which would be necessary to gain a deep enough understanding of the world to perform complex tasks. With this work, we aim to broaden the research in the intersection of causality theory and neural world modelling by assessing the potential for causal discovery of the State Space Model (SSM) architecture, which has been shown to have several advantages over the widespread Transformer. We show empirically that, compared to an equivalent Transformer, a SSM can model the dynamics of a simple environment and learn a causal model at the same time with equivalent or better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Adam · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax