Polynomial Regret Concentration of UCB for Non-Deterministic State   Transitions

Can C\"omer; Jannis Bl\"uml; Cedric Derstroff; Kristian Kersting

arXiv:2502.06900·cs.LG·February 12, 2025

Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions

Can C\"omer, Jannis Bl\"uml, Cedric Derstroff, Kristian Kersting

PDF

Open Access

TL;DR

This paper extends the theoretical analysis of the UCB algorithm to environments with stochastic, non-deterministic state transitions, providing polynomial regret bounds that enhance MCTS applicability in probabilistic decision-making.

Contribution

It proves polynomial regret concentration bounds for UCB in non-deterministic environments, broadening MCTS's theoretical guarantees in stochastic settings.

Findings

01

Polynomial regret bounds are established for UCB with stochastic transitions.

02

The bounds apply to non-deterministic environments, ensuring robustness.

03

The results extend MCTS applicability to real-world probabilistic decision problems.

Abstract

Monte Carlo Tree Search (MCTS) has proven effective in solving decision-making problems in perfect information settings. However, its application to stochastic and imperfect information domains remains limited. This paper extends the theoretical framework of MCTS to stochastic domains by addressing non-deterministic state transitions, where actions lead to probabilistic outcomes. Specifically, building on the work of Shah et al. (2020), we derive polynomial regret concentration bounds for the Upper Confidence Bound algorithm in multi-armed bandit problems with stochastic transitions, offering improved theoretical guarantees. Our primary contribution is proving that these bounds also apply to non-deterministic environments, ensuring robust performance in stochastic settings. This broadens the applicability of MCTS to real-world decision-making problems with probabilistic outcomes, such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization