Entropic Risk-Aware Monte Carlo Tree Search

Pedro P. Santos; Jacopo Silvestrin; Alberto Sardinha; Francisco S. Melo

arXiv:2601.17667·cs.LG·February 6, 2026

Entropic Risk-Aware Monte Carlo Tree Search

Pedro P. Santos, Jacopo Silvestrin, Alberto Sardinha, Francisco S. Melo

PDF

Open Access

TL;DR

This paper introduces a provably correct Monte Carlo tree search algorithm for risk-aware Markov decision processes using entropic risk measures, with theoretical guarantees and empirical validation.

Contribution

It presents a novel risk-aware MCTS algorithm with non-asymptotic analysis, convergence guarantees, and polynomial regret bounds for ERM objectives.

Findings

01

Empirical ERM at root converges to the optimal ERM.

02

Algorithm exhibits polynomial regret concentration.

03

Outperforms relevant baselines in experiments.

Abstract

We propose a provably correct Monte Carlo tree search (MCTS) algorithm for solving risk-aware Markov decision processes (MDPs) with entropic risk measure (ERM) objectives. We provide a non-asymptotic analysis of our proposed algorithm, showing that the algorithm: (i) is correct in the sense that the empirical ERM obtained at the root node converges to the optimal ERM; and (ii) enjoys polynomial regret concentration. Our algorithm successfully exploits the dynamic programming formulations for solving risk-aware MDPs with ERM objectives introduced by previous works in the context of an upper confidence bound-based tree search algorithm. Finally, we provide a set of illustrative experiments comparing our risk-aware MCTS method against relevant baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods