Landmark-Assisted Monte Carlo Planning

David H. Chan; Mark Roberts; Dana S. Nau

arXiv:2508.11493·cs.AI·August 18, 2025

Landmark-Assisted Monte Carlo Planning

David H. Chan, Mark Roberts, Dana S. Nau

PDF

TL;DR

This paper introduces probabilistic landmarks to improve Monte Carlo planning in stochastic domains, demonstrating that well-chosen landmarks enhance UCT performance in benchmark MDPs by guiding the search process effectively.

Contribution

It formalizes probabilistic landmarks and adapts the UCT algorithm to use them as subgoals, improving online planning in stochastic environments.

Findings

01

Landmarks significantly improve UCT performance in benchmark domains.

02

The optimal balance between greedy landmark achievement and goal achievement is problem-dependent.

03

Landmarks provide valuable guidance for anytime algorithms in MDPs.

Abstract

Landmarks $\unicode x 2013$ conditions that must be satisfied at some point in every solution plan $\unicode x 2013$ have contributed to major advancements in classical planning, but they have seldom been used in stochastic domains. We formalize probabilistic landmarks and adapt the UCT algorithm to leverage them as subgoals to decompose MDPs; core to the adaptation is balancing between greedy landmark achievement and final goal achievement. Our results in benchmark domains show that well-chosen landmarks can significantly improve the performance of UCT in online probabilistic planning, while the best balance of greedy versus long-term goal achievement is problem-dependent. The results suggest that landmarks can provide helpful guidance for anytime algorithms solving MDPs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.