Self-Improving World Modelling with Latent Actions

Yifu Qiu; Zheng Zhao; Waylon Li; Yftah Ziser; Anna Korhonen; Shay B. Cohen; Edoardo M. Ponti

arXiv:2602.06130·cs.LG·February 17, 2026

Self-Improving World Modelling with Latent Actions

Yifu Qiu, Zheng Zhao, Waylon Li, Yftah Ziser, Anna Korhonen, Shay B. Cohen, Edoardo M. Ponti

PDF

Open Access

TL;DR

SWIRL is a self-improving framework that learns world models from state-only sequences by treating actions as latent variables, using alternating variational and ELBO maximization with reinforcement learning, and demonstrating significant performance gains.

Contribution

It introduces SWIRL, a novel approach for learning world models from state-only data by combining variational information maximization and ELBO maximization with reinforcement learning.

Findings

01

Achieves up to 28% improvement on benchmark tasks.

02

Effectively learns from state-only sequences without explicit action labels.

03

Provides theoretical guarantees for the learning process.

Abstract

Internal modelling of the world -- predicting transitions between previous states $X$ and next states $Y$ under actions $Z$ -- is essential to reasoning and planning for LLMs and VLMs. Learning such models typically requires costly action-labelled trajectories. We propose SWIRL, a self-improvement framework that learns from state-only sequences by treating actions as a latent variable and alternating between Forward World Modelling (FWM) $P_{θ} (Y ∣ X, Z)$ and an Inverse Dynamics Modelling (IDM) $Q_{ϕ} (Z ∣ X, Y)$ . SWIRL iterates two phases: (1) Variational Information Maximisation, which updates the FWM to generate next states that maximise conditional mutual information with latent actions given prior states, encouraging identifiable consistency; and (2) ELBO Maximisation, which updates the IDM to explain observed transitions, effectively performing coordinate ascent. Both models are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning