Entropic Risk-Aware Monte Carlo Tree Search
Pedro P. Santos, Jacopo Silvestrin, Alberto Sardinha, Francisco S. Melo

TL;DR
This paper introduces a provably correct Monte Carlo tree search algorithm for risk-aware Markov decision processes using entropic risk measures, with theoretical guarantees and empirical validation.
Contribution
It presents a novel risk-aware MCTS algorithm with non-asymptotic analysis, convergence guarantees, and polynomial regret bounds for ERM objectives.
Findings
Empirical ERM at root converges to the optimal ERM.
Algorithm exhibits polynomial regret concentration.
Outperforms relevant baselines in experiments.
Abstract
We propose a provably correct Monte Carlo tree search (MCTS) algorithm for solving risk-aware Markov decision processes (MDPs) with entropic risk measure (ERM) objectives. We provide a non-asymptotic analysis of our proposed algorithm, showing that the algorithm: (i) is correct in the sense that the empirical ERM obtained at the root node converges to the optimal ERM; and (ii) enjoys polynomial regret concentration. Our algorithm successfully exploits the dynamic programming formulations for solving risk-aware MDPs with ERM objectives introduced by previous works in the context of an upper confidence bound-based tree search algorithm. Finally, we provide a set of illustrative experiments comparing our risk-aware MCTS method against relevant baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods
