Investigating Scale Independent UCT Exploration Factor Strategies

Robin Schm\"ocker; Christoph Schnell; Alexander Dockhorn

arXiv:2510.21275·cs.AI·October 27, 2025

Investigating Scale Independent UCT Exploration Factor Strategies

Robin Schm\"ocker, Christoph Schnell, Alexander Dockhorn

PDF

Open Access

TL;DR

This paper proposes adaptive strategies for setting the UCT exploration constant in tree search algorithms, making them robust to different reward scales across various games, and demonstrates their effectiveness through experiments.

Contribution

The paper introduces five new lambda-strategies for UCT exploration, including a data-driven method using Q-value standard deviation, improving performance across diverse tasks.

Findings

01

The proposed lambda = 2 * standard deviation method outperforms existing strategies.

02

Adaptive lambda strategies achieve better peak performance and robustness.

03

Experimental results span a wide range of game environments.

Abstract

The Upper Confidence Bounds For Trees (UCT) algorithm is not agnostic to the reward scale of the game it is applied to. For zero-sum games with the sparse rewards of ${- 1, 0, 1}$ at the end of the game, this is not a problem, but many games often feature dense rewards with hand-picked reward scales, causing a node's Q-value to span different magnitudes across different games. In this paper, we evaluate various strategies for adaptively choosing the UCT exploration constant $λ$ , called $λ$ -strategies, that are agnostic to the game's reward scale. These $λ$ -strategies include those proposed in the literature as well as five new strategies. Given our experimental results, we recommend using one of our newly suggested $λ$ -strategies, which is to choose $λ$ as $2 \cdot σ$ where $σ$ is the empirical standard deviation of all state-action pairs' Q-values…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research