Factored Latent Action World Models

Zizhao Wang; Chang Shi; Jiaheng Hu; Kevin Rohling; Roberto Mart\'in-Mart\'in; Amy Zhang; Peter Stone

arXiv:2602.16229·cs.LG·February 19, 2026

Factored Latent Action World Models

Zizhao Wang, Chang Shi, Jiaheng Hu, Kevin Rohling, Roberto Mart\'in-Mart\'in, Amy Zhang, Peter Stone

PDF

Open Access

TL;DR

The paper introduces FLAM, a factored latent action model that decomposes scenes into independent factors, enabling better modeling of multi-entity dynamics and improved video generation in action-free videos.

Contribution

It proposes a novel factored dynamics framework that infers separate latent actions for different scene factors, enhancing modeling accuracy in complex multi-entity environments.

Findings

01

FLAM outperforms prior models in prediction accuracy.

02

FLAM improves representation quality.

03

Facilitates downstream policy learning.

Abstract

Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most existing approaches rely on monolithic inverse and forward dynamics models that learn a single latent action to control the entire scene, and therefore struggle in complex environments where multiple entities act simultaneously. This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes the scene into independent factors, each inferring its own latent action and predicting its own next-step factor value. This factorized structure enables more accurate modeling of complex multi-entity dynamics and improves video generation quality in action-free video settings compared to monolithic models. Based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation