Super-Exponential Regret for UCT, AlphaGo and Variants

Laurent Orseau; Remi Munos

arXiv:2405.04407·cs.LG·May 20, 2024

Super-Exponential Regret for UCT, AlphaGo and Variants

Laurent Orseau, Remi Munos

PDF

Open Access

TL;DR

This paper refines lower bound proofs for UCT and its variants, including AlphaGo's MCTS, demonstrating they can incur super-exponential regret in certain environments, correcting previous oversights in the proofs.

Contribution

It provides corrected and extended lower bound proofs for UCT and AlphaGo's MCTS, showing they can suffer super-exponential regret in specific scenarios.

Findings

01

UCT can have super-exponential regret on D-chain environments.

02

Polynomial UCT variants also exhibit super-exponential regret.

03

The proofs for these bounds are corrected and extended to AlphaGo's MCTS.

Abstract

We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $exp (\dots exp (1) \dots)$ regret (with $Ω (D)$ exp terms) on the $D$ -chain environment, and that a `polynomial' UCT variant has $exp_{2} (exp_{2} (D - O (lo g D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$ , which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $exp_{2} (exp_{2} (D - O (lo g D)))$ regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsAlphaZero