Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search

Maximilian Weichart

arXiv:2512.21648·cs.LG·April 28, 2026

Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search

Maximilian Weichart

PDF

1 Repo

TL;DR

This paper introduces a systematic method to derive prior-based UCTs from a broad class of UCBs, leading to variance-aware policies that outperform existing methods in Monte Carlo Tree Search benchmarks.

Contribution

The authors develop Inverse-RPO, a general approach to derive prior-based UCTs from various UCBs, and introduce variance-aware UCTs that improve performance without extra computational cost.

Findings

01

Variance-aware UCTs outperform PUCT in benchmarks.

02

Minimal code changes needed for variance-aware UCTs.

03

Inverse-RPO provides a systematic derivation of prior-based UCTs.

Abstract

Monte Carlo Tree Search (MCTS) has profoundly influenced reinforcement learning (RL) by integrating planning and learning in tasks requiring long-horizon reasoning, exemplified by the AlphaZero family of algorithms. Central to MCTS is the search strategy, governed by a tree policy based on an upper confidence bound (UCB) applied to trees (UCT). A key factor in the success of AlphaZero is the introduction of a prior term in the UCB1-based tree policy PUCT, which improves exploration efficiency and thus accelerates training. While many alternative UCBs with stronger theoretical guarantees than UCB1 exist, extending them to prior-based UCTs has been challenging, since PUCT was derived empirically rather than from first principles. Recent work retrospectively justified PUCT by framing MCTS as a regularized policy optimization (RPO) problem. Building on this perspective, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Max-We/inverse-rpo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.