DreamerV3-XP: Optimizing exploration through uncertainty estimation

Lukas Bierling; Davide Pasero; Jan-Henrik Bertrand; Kiki Van Gerwen

arXiv:2510.21418·cs.LG·October 27, 2025

DreamerV3-XP: Optimizing exploration through uncertainty estimation

Lukas Bierling, Davide Pasero, Jan-Henrik Bertrand, Kiki Van Gerwen

PDF

TL;DR

DreamerV3-XP enhances exploration and learning efficiency in reinforcement learning by integrating prioritized replay and intrinsic rewards based on model disagreement, leading to faster learning especially in sparse-reward environments.

Contribution

It introduces prioritized replay and intrinsic reward mechanisms to improve exploration and efficiency in DreamerV3, a novel extension for reinforcement learning.

Findings

01

Faster learning on Atari100k and DeepMind Control benchmarks.

02

Lower dynamics model loss in sparse-reward settings.

03

Confirmed effectiveness of proposed extensions.

Abstract

We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconstruction loss, and value error and (ii) an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. DreamerV3-XP is evaluated on a subset of Atari100k and DeepMind Control Visual Benchmark tasks, confirming the original DreamerV3 results and showing that our extensions lead to faster learning and lower dynamics model loss, particularly in sparse-reward settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.