LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes; Quentin Le Lidec; Damien Scieur; Yann LeCun; Randall Balestriero

arXiv:2603.19312·cs.LG·March 26, 2026

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero

PDF

Open Access 4 Models 5 Datasets

TL;DR

LeWorldModel introduces a stable, end-to-end pixel-based world model using a simplified loss structure, enabling faster training and effective physical understanding without complex auxiliary components.

Contribution

LeWM is the first JEPA trained end-to-end from pixels with only two loss terms, simplifying training and improving stability compared to prior methods.

Findings

01

Faster training, up to 48x speedup over foundation models.

02

Effective encoding of physical structure in latent space.

03

Reliable detection of physically implausible events.

Abstract

Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning