Learning Latent Action World Models In The Wild

Quentin Garrido; Tushar Nagarajan; Basile Terver; Nicolas Ballas; Yann LeCun; Michael Rabbat

arXiv:2601.05230·cs.AI·January 21, 2026

Learning Latent Action World Models In The Wild

Quentin Garrido, Tushar Nagarajan, Basile Terver, Nicolas Ballas, Yann LeCun, Michael Rabbat

PDF

Open Access

TL;DR

This paper introduces a method for learning latent action world models directly from in-the-wild videos, enabling reasoning and planning without explicit action labels, and demonstrates their effectiveness in complex real-world scenarios.

Contribution

It presents a novel approach for learning continuous, constrained latent actions from diverse in-the-wild videos, expanding the applicability of world models beyond controlled environments.

Findings

01

Latent actions can capture complex real-world behaviors.

02

Continuous, constrained latent actions outperform vector quantization.

03

A controller can map known actions to latent space for planning.

Abstract

Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this capability, they most often require action labels, that can be complex to obtain at scale. This motivates the learning of latent action models, that can learn an action space from videos alone. Our work addresses the problem of learning latent actions world models on in-the-wild videos, expanding the scope of existing works that focus on simple robotics simulations, video games, or manipulation data. While this allows us to capture richer actions, it also introduces challenges stemming from the video diversity, such as environmental noise, or the lack of a common embodiment across videos. To address some of the challenges, we discuss properties that actions should follow as well as relevant architectural choices and evaluations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning