Next Embedding Prediction Makes World Models Stronger
George Bredis, Nikita Balagansky, Daniil Gavrilov, Ruslan Rakhimov

TL;DR
NE-Dreamer introduces a decoder-free, transformer-based approach for model-based reinforcement learning that predicts next-step embeddings, improving performance in complex, partially observable environments without reconstruction losses.
Contribution
The paper presents NE-Dreamer, a novel MBRL agent using temporal transformers for next-embedding prediction, eliminating the need for decoders and auxiliary losses.
Findings
Matches or exceeds DreamerV3 performance on DeepMind Control Suite
Achieves substantial gains on DMLab tasks involving memory and spatial reasoning
Establishes next-embedding prediction as an effective MBRL framework
Abstract
Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
