Landmark-Assisted Monte Carlo Planning
David H. Chan, Mark Roberts, Dana S. Nau

TL;DR
This paper introduces probabilistic landmarks to improve Monte Carlo planning in stochastic domains, demonstrating that well-chosen landmarks enhance UCT performance in benchmark MDPs by guiding the search process effectively.
Contribution
It formalizes probabilistic landmarks and adapts the UCT algorithm to use them as subgoals, improving online planning in stochastic environments.
Findings
Landmarks significantly improve UCT performance in benchmark domains.
The optimal balance between greedy landmark achievement and goal achievement is problem-dependent.
Landmarks provide valuable guidance for anytime algorithms in MDPs.
Abstract
Landmarksconditions that must be satisfied at some point in every solution planhave contributed to major advancements in classical planning, but they have seldom been used in stochastic domains. We formalize probabilistic landmarks and adapt the UCT algorithm to leverage them as subgoals to decompose MDPs; core to the adaptation is balancing between greedy landmark achievement and final goal achievement. Our results in benchmark domains show that well-chosen landmarks can significantly improve the performance of UCT in online probabilistic planning, while the best balance of greedy versus long-term goal achievement is problem-dependent. The results suggest that landmarks can provide helpful guidance for anytime algorithms solving MDPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
