Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic
Truong-Huy Dinh Nguyen, Wee-Sun Lee, and Tze-Yun Leong

TL;DR
This paper introduces UCT-Aux, a novel Monte Carlo Tree Search enhancement that leverages an auxiliary arm with a heuristic policy to improve value approximation, especially where the heuristic is effective.
Contribution
The paper proposes UCT-Aux, a new method adding auxiliary arms to UCT to incorporate heuristic policies, improving convergence and performance in large-state MDP planning.
Findings
UCT-Aux outperforms original UCT in benchmark tests.
The method converges faster in states where the heuristic is accurate.
Conditions for effective use of UCT-Aux are identified.
Abstract
We consider the problem of using a heuristic policy to improve the value approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in non-adversarial settings such as planning with large-state space Markov Decision Processes. Current improvements to UCT focus on either changing the action selection formula at the internal nodes or the rollout policy at the leaf nodes of the search tree. In this work, we propose to add an auxiliary arm to each of the internal nodes, and always use the heuristic policy to roll out simulations at the auxiliary arms. The method aims to get fast convergence to optimal values at states where the heuristic policy is optimal, while retaining similar approximation as the original UCT in other states. We show that bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs better compared to the original UCT algorithm and its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
