Policy-Guided Heuristic Search with Guarantees
Laurent Orseau, Levi H. S. Lelis

TL;DR
This paper introduces Policy-guided Heuristic Search (PHS), a new algorithm combining policies and heuristics with guarantees on search efficiency, improving problem-solving in complex domains.
Contribution
PHS is a novel search method that integrates policies and heuristics, providing theoretical guarantees on search loss and demonstrating superior empirical performance.
Findings
PHS outperforms A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT.
PHS enables rapid learning of policies and heuristics.
PHS solves more problems faster across multiple domains.
Abstract
The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorithm with a learned heuristic function tends to work better in these domains, but A* and its variants do not use a policy. Moreover, the purpose of using A* is to find solutions of minimum cost, while we seek instead to minimize the search loss (e.g., the number of search steps). LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function. In this work we introduce Policy-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms
