Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Akshay Mete; Shahid Aamir Sheikh; Tzu-Hsiang Lin; Dileep Kalathil; and P. R. Kumar

arXiv:2602.10044·cs.LG·February 11, 2026

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Akshay Mete, Shahid Aamir Sheikh, Tzu-Hsiang Lin, Dileep Kalathil, and P. R. Kumar

PDF

Open Access

TL;DR

This paper introduces Optimistic World Models (OWMs), a scalable deep RL exploration method that biases model learning towards higher rewards, improving sample efficiency and performance in sparse-reward environments.

Contribution

The paper presents OWMs, a novel, fully gradient-based optimistic exploration framework integrated into deep world models, enhancing exploration without uncertainty estimation.

Findings

01

OWMs improve sample efficiency in sparse-reward tasks.

02

Optimistic DreamerV3 and STORM outperform baseline models.

03

The approach requires minimal modifications to existing frameworks.

Abstract

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning