DreamerV3-XP: Optimizing exploration through uncertainty estimation
Lukas Bierling, Davide Pasero, Jan-Henrik Bertrand, Kiki Van Gerwen

TL;DR
DreamerV3-XP enhances exploration and learning efficiency in reinforcement learning by integrating prioritized replay and intrinsic rewards based on model disagreement, leading to faster learning especially in sparse-reward environments.
Contribution
It introduces prioritized replay and intrinsic reward mechanisms to improve exploration and efficiency in DreamerV3, a novel extension for reinforcement learning.
Findings
Faster learning on Atari100k and DeepMind Control benchmarks.
Lower dynamics model loss in sparse-reward settings.
Confirmed effectiveness of proposed extensions.
Abstract
We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconstruction loss, and value error and (ii) an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. DreamerV3-XP is evaluated on a subset of Atari100k and DeepMind Control Visual Benchmark tasks, confirming the original DreamerV3 results and showing that our extensions lead to faster learning and lower dynamics model loss, particularly in sparse-reward settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
