Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds
Daniel R. Jiang, Lina Al-Kanj, Warren B. Powell

TL;DR
This paper introduces Primal-Dual MCTS, a novel algorithm that uses sampled information relaxation bounds to focus on promising actions, enabling convergence to optimal decisions with shallower trees in complex decision problems.
Contribution
It proposes a new Monte Carlo Tree Search variant that incorporates upper bounds to ignore suboptimal branches, ensuring optimality despite partial tree expansion.
Findings
Primal-Dual MCTS produces deeper, more focused decision trees.
The method shows improved performance over standard MCTS.
It reduces sensitivity to large action spaces.
Abstract
Monte Carlo Tree Search (MCTS), most famously used in game-play artificial intelligence (e.g., the game of Go), is a well-known strategy for constructing approximate solutions to sequential decision problems. Its primary innovation is the use of a heuristic, known as a default policy, to obtain Monte Carlo estimates of downstream values for states in a decision tree. This information is used to iteratively expand the tree towards regions of states and actions that an optimal policy might visit. However, to guarantee convergence to the optimal action, MCTS requires the entire tree to be expanded asymptotically. In this paper, we propose a new technique called Primal-Dual MCTS that utilizes sampled information relaxation upper bounds on potential actions, creating the possibility of "ignoring" parts of the tree that stem from highly suboptimal choices. This allows us to prove that despite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
