Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search
John Asmuth, Michael L. Littman

TL;DR
This paper demonstrates that an agent can achieve near Bayes-optimal reinforcement learning by employing Monte-Carlo tree search, specifically FSSS, in large or infinite state spaces with theoretical efficiency guarantees.
Contribution
It introduces a method for using FSSS to approximate Bayes-optimal behavior efficiently in unknown MDPs, bridging planning and learning in reinforcement learning.
Findings
FSSS can act nearly Bayes-optimally in large MDPs.
The approach guarantees near-optimality for all but polynomially many steps.
The method extends Monte-Carlo tree search to Bayesian reinforcement learning.
Abstract
Bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in the use of Monte-Carlo tree search (MCTS) have shown that it is possible to act near-optimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayes-optimal behavior in an unknown MDP is equivalent to optimal behavior in the known belief-space MDP, although the size of this belief-space MDP grows exponentially with the amount of history retained, and is potentially infinite. We show how an agent can use one particular MCTS algorithm, Forward Search Sparse Sampling (FSSS), in an efficient way to act nearly Bayes-optimally for all but a polynomial number of steps, assuming that FSSS can be used to act efficiently in any possible underlying MDP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques
