Enter the Void - Planning to Seek Entropy When Reward is Scarce
Ashish Sundar, Chunbo Luo, Xiaoyang Wang

TL;DR
This paper introduces a hierarchical planning approach that actively seeks informative states using world model predictions, significantly improving sample efficiency in model-based reinforcement learning, demonstrated on various benchmarks.
Contribution
It proposes a novel hierarchical planning method that dynamically manages replanning and entropy search, enhancing sample efficiency over traditional curiosity-driven and MPC methods.
Findings
Achieves 50% faster maze completion in MiniWorld compared to base Dreamer.
Reduces environment steps by 40% to reach the same reward in Crafter.
Improves sample efficiency on DeepMind Control tasks.
Abstract
Model-based reinforcement learning (MBRL) offers an intuitive way to increase the sample efficiency of model-free RL methods by simultaneously training a world model that learns to predict the future. These models constitute the large majority of training compute and time and they are subsequently used to train actors entirely in simulation, but once this is done they are quickly discarded. We show in this work that utilising these models at inference time can significantly boost sample efficiency. We propose a novel approach that anticipates and actively seeks out informative states using the world model's short-horizon latent predictions, offering a principled alternative to traditional curiosity-driven methods that chase outdated estimates of high uncertainty states. While many model predictive control (MPC) based methods offer similar alternatives, they typically lack commitment,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
MethodsBalanced Selection
