Enter the Void - Planning to Seek Entropy When Reward is Scarce

Ashish Sundar; Chunbo Luo; Xiaoyang Wang

arXiv:2505.16787·cs.AI·December 19, 2025

Enter the Void - Planning to Seek Entropy When Reward is Scarce

Ashish Sundar, Chunbo Luo, Xiaoyang Wang

PDF

Open Access

TL;DR

This paper introduces a hierarchical planning approach that actively seeks informative states using world model predictions, significantly improving sample efficiency in model-based reinforcement learning, demonstrated on various benchmarks.

Contribution

It proposes a novel hierarchical planning method that dynamically manages replanning and entropy search, enhancing sample efficiency over traditional curiosity-driven and MPC methods.

Findings

01

Achieves 50% faster maze completion in MiniWorld compared to base Dreamer.

02

Reduces environment steps by 40% to reach the same reward in Crafter.

03

Improves sample efficiency on DeepMind Control tasks.

Abstract

Model-based reinforcement learning (MBRL) offers an intuitive way to increase the sample efficiency of model-free RL methods by simultaneously training a world model that learns to predict the future. These models constitute the large majority of training compute and time and they are subsequently used to train actors entirely in simulation, but once this is done they are quickly discarded. We show in this work that utilising these models at inference time can significantly boost sample efficiency. We propose a novel approach that anticipates and actively seeks out informative states using the world model's short-horizon latent predictions, offering a principled alternative to traditional curiosity-driven methods that chase outdated estimates of high uncertainty states. While many model predictive control (MPC) based methods offer similar alternatives, they typically lack commitment,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research

MethodsBalanced Selection