JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning
Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu

TL;DR
JEDI introduces an end-to-end latent diffusion world model for reinforcement learning that improves efficiency and performance by directly learning predictive latents through diffusion denoising.
Contribution
It is the first online end-to-end latent diffusion world model that integrates diffusion objectives with JEPA-style predictive learning.
Findings
JEDI outperforms baseline with separately trained latents on Atari100k.
JEDI uses 43% less VRAM compared to pixel diffusion baseline.
JEDI achieves over 3× faster world-model sampling and 2.5× faster training.
Abstract
Diffusion world models have recently become competitive for online model-based reinforcement learning, but current approaches expose a tension: pixel diffusion is effective but computationally expensive while the latest latent diffusion approach improves efficiency yet performs subpar. The latter also relies on separately trained latents rather than the end-to-end world-model objectives that have driven much of modern MBRL progress. In particular, JEPA-style predictive representation learning has emerged as an especially promising direction for world modeling and MBRL. Concurrently, diffusion-style objectives have gained traction across multiple domains, with iterative refinement as a promising approach for multimodal and stochastic targets. Taken together, these trends motivate Joint Embedding DIffusion (JEDI), the first online end-to-end latent diffusion world model. JEDI learns its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
