Monte Carlo Tree Search guided by Symbolic Advice for MDPs

Damien Busatto-Gaston; Debraj Chakraborty; Jean-Francois Raskin

arXiv:2006.04712·cs.GT·July 17, 2020·1 cites

Monte Carlo Tree Search guided by Symbolic Advice for MDPs

Damien Busatto-Gaston, Debraj Chakraborty, Jean-Francois Raskin

PDF

Open Access

TL;DR

This paper introduces an enhanced Monte Carlo Tree Search algorithm for Markov decision processes that incorporates symbolic advice via QBF and SAT solvers, improving performance in complex games like Pac-Man.

Contribution

It presents a novel method of integrating symbolic advice into MCTS while maintaining theoretical guarantees, demonstrated through practical game experiments.

Findings

01

Enhanced MCTS outperforms standard MCTS in Pac-Man.

02

Symbolic advice improves decision-making efficiency.

03

Algorithm maintains theoretical guarantees of classical MCTS.

Abstract

In this paper, we consider the online computation of a strategy that aims at optimizing the expected average reward in a Markov decision process. The strategy is computed with a receding horizon and using Monte Carlo tree search (MCTS). We augment the MCTS algorithm with the notion of symbolic advice, and show that its classical theoretical guarantees are maintained. Symbolic advice are used to bias the selection and simulation strategies of MCTS. We describe how to use QBF and SAT solvers to implement symbolic advice in an efficient way. We illustrate our new algorithm using the popular game Pac-Man and show that the performances of our algorithm exceed those of plain MCTS as well as the performances of human players.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Sports Analytics and Performance