Co-Evolving Latent Action World Models

Yucen Wang; Fengming Zhang; De-Chuan Zhan; Li Zhao; Kaixin Wang; and Jiang Bian

arXiv:2510.26433·cs.LG·April 7, 2026

Co-Evolving Latent Action World Models

Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, and Jiang Bian

PDF

TL;DR

CoLA-World introduces a joint training paradigm for latent action world models, enabling co-evolution of models and improved video simulation and planning performance.

Contribution

It proposes a novel method for jointly training latent action models with pretrained world models, overcoming representational collapse and enhancing control and simulation quality.

Findings

01

Matches or outperforms prior methods in video simulation quality.

02

Improves downstream visual planning performance.

03

Successfully implements co-evolution of models through a warm-up phase.

Abstract

Adapting pretrained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pretrained world model. This unlocks a co-evolution cycle:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.