JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

Jing Yu Lim; Rushi Shah; Zarif Ikram; Samson Yu; Haozhe Ma; Tze-Yun Leong; Dianbo Liu

arXiv:2605.13013·cs.LG·May 14, 2026

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu

PDF

TL;DR

JEDI introduces an end-to-end latent diffusion world model for reinforcement learning that improves efficiency and performance by directly learning predictive latents through diffusion denoising.

Contribution

It is the first online end-to-end latent diffusion world model that integrates diffusion objectives with JEPA-style predictive learning.

Findings

01

JEDI outperforms baseline with separately trained latents on Atari100k.

02

JEDI uses 43% less VRAM compared to pixel diffusion baseline.

03

JEDI achieves over 3× faster world-model sampling and 2.5× faster training.

Abstract

Diffusion world models have recently become competitive for online model-based reinforcement learning, but current approaches expose a tension: pixel diffusion is effective but computationally expensive while the latest latent diffusion approach improves efficiency yet performs subpar. The latter also relies on separately trained latents rather than the end-to-end world-model objectives that have driven much of modern MBRL progress. In particular, JEPA-style predictive representation learning has emerged as an especially promising direction for world modeling and MBRL. Concurrently, diffusion-style objectives have gained traction across multiple domains, with iterative refinement as a promising approach for multimodal and stochastic targets. Taken together, these trends motivate Joint Embedding DIffusion (JEDI), the first online end-to-end latent diffusion world model. JEDI learns its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.