Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration
Zakaria Mhammedi, James Cohan

TL;DR
This paper introduces a novel exploration method that decouples exploration from policy optimization, using a tree search guided by uncertainty, leading to more efficient exploration and state coverage in complex environments.
Contribution
It proposes a new approach that bypasses reinforcement learning during exploration, enabling more efficient discovery and policy distillation for hard exploration tasks.
Findings
Explores an order of magnitude more efficiently than intrinsic motivation baselines.
Achieves state-of-the-art performance on Montezuma's Revenge, Pitfall!, and Venture.
Successfully solves high-dimensional continuous tasks from images without demonstrations.
Abstract
The process of discovery requires active exploration -- the act of collecting new and informative data. However, efficient autonomous exploration remains a major unsolved problem. The dominant paradigm addresses this challenge by using Reinforcement Learning (RL) to train agents with intrinsic motivation, maximizing a composite objective of extrinsic and intrinsic rewards. We suggest that this approach incurs unnecessary overhead: while policy optimization is necessary for precise task execution, employing such machinery solely to expand state coverage may be inefficient. In this paper, we propose a new approach that explicitly decouples exploration from policy optimization and bypasses RL entirely during the exploration phase. Our method uses a tree-search strategy inspired by the Go-With-The-Winner algorithm, paired with a measure of uncertainty to systematically drive exploration. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
