Threshold UCT: Cost-Constrained Monte Carlo Tree Search with Pareto Curves
Martin Kure\v{c}ka, V\'aclav Nevyho\v{s}t\v{e}n\'y, Petr, Novotn\'y, V\'it Un\v{c}ovsk\'y

TL;DR
Threshold UCT (T-UCT) is a novel Monte Carlo tree search algorithm for constrained Markov decision processes that explicitly estimates Pareto curves to find safe and valuable policies more effectively.
Contribution
T-UCT introduces Pareto curve estimation and new action selection rules to improve safety and value in CMDP planning with MCTS.
Findings
T-UCT outperforms existing methods in safety and value trade-offs.
The algorithm effectively estimates Pareto curves during search.
Experimental results show significant performance improvements.
Abstract
Constrained Markov decision processes (CMDPs), in which the agent optimizes expected payoffs while keeping the expected cost below a given threshold, are the leading framework for safe sequential decision making under stochastic uncertainty. Among algorithms for planning and learning in CMDPs, methods based on Monte Carlo tree search (MCTS) have particular importance due to their efficiency and extendibility to more complex frameworks (such as partially observable settings and games). However, current MCTS-based methods for CMDPs either struggle with finding safe (i.e., constraint-satisfying) policies, or are too conservative and do not find valuable policies. We introduce Threshold UCT (T-UCT), an online MCTS-based algorithm for CMDP planning. Unlike previous MCTS-based CMDP planners, T-UCT explicitly estimates Pareto curves of cost-utility trade-offs throughout the search tree, using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis · Topic Modeling
