Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
Liam Schramm, Abdeslam Boularias

TL;DR
This paper introduces Volume-MCTS, a novel tree search algorithm that enhances long-horizon exploration in Monte Carlo tree search by using state occupancy regularization, outperforming existing methods like AlphaZero in robot navigation tasks.
Contribution
The paper proposes a new MCTS variant based on policy optimization with state occupancy regularization, bridging the gap between MCTS and sampling-based motion planning.
Findings
Volume-MCTS outperforms AlphaZero in long-horizon exploration tasks.
The method effectively addresses limitations of traditional MCTS in complex environments.
Count-based exploration and motion planning are shown as approximate solutions to the proposed framework.
Abstract
Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state occupancy measure regularization, which we call {\it Volume-MCTS}. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Medical Image Segmentation Techniques
MethodsAlphaZero
