Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning

Stephen Wissow; Masataro Asai

arXiv:2305.09840·cs.AI·March 30, 2026·1 cites

Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning

Stephen Wissow, Masataro Asai

PDF

TL;DR

This paper introduces GreedyUCT-Normal, a novel MCTS/THTS algorithm that adapts UCB1 to handle reward scale variations in classical planning, improving planning efficiency.

Contribution

It proposes a new UCB1-Normal bandit approach within MCTS/THTS, addressing reward scale issues and enhancing planning performance over existing methods.

Findings

01

GreedyUCT-Normal finds more plans with fewer node expansions.

02

The new algorithm outperforms Greedy Best First Search and previous MCTS/THTS algorithms.

03

Handling reward variance improves planning efficiency.

Abstract

Balancing exploration and exploitation has been an important problem in both game tree search and automated planning. However, while the problem has been extensively analyzed within the Multi-Armed Bandit (MAB) literature, the planning community has had limited success when attempting to apply those results. We show that a more detailed theoretical understanding of MAB literature helps improve existing planning algorithms that are based on Monte Carlo Tree Search (MCTS) / Trial Based Heuristic Tree Search (THTS). In particular, THTS uses UCB1 MAB algorithms in an ad hoc manner, as UCB1's theoretical requirement of fixed bounded support reward distributions is not satisfied within heuristic search for classical planning. The core issue lies in UCB1's lack of adaptations to the different scales of the rewards. We propose GreedyUCT-Normal, a MCTS/THTS algorithm with UCB1-Normal bandit for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.