Twice Sequential Monte Carlo for Tree Search
Yaniv Oren, Joery A. de Vries, Pascal R. van der Vaart, Matthijs T. J. Spaan, Wendelin B\"ohmer

TL;DR
This paper introduces TSMCTS, a novel variant of Sequential Monte Carlo Tree Search, which improves scalability, reduces variance, and mitigates path degeneracy in model-based reinforcement learning.
Contribution
The paper proposes TSMCTS, enhancing SMC for tree search by addressing variance and path degeneracy, and demonstrating superior performance and scalability.
Findings
TSMCTS outperforms SMC and modern MCTS in various environments.
It scales better with increased search depth and compute.
It reduces estimator variance and mitigates path degeneracy.
Abstract
Model-based reinforcement learning (RL) methods that leverage search are responsible for many milestone breakthroughs in RL. Sequential Monte Carlo (SMC) recently emerged as an alternative to the Monte Carlo Tree Search (MCTS) algorithm which drove these breakthroughs. SMC is easier to parallelize and more suitable to GPU acceleration. However, it also suffers from large variance and path degeneracy which prevent it from scaling well with increased search depth, i.e., increased sequential compute. To address these problems, we introduce Twice Sequential Monte Carlo Tree Search (TSMCTS). Across discrete and continuous environments TSMCTS outperforms the SMC baseline as well as a popular modern version of MCTS as a policy improvement operator, scales favorably with sequential compute, reduces estimator variance and mitigates the effects of path degeneracy while retaining the properties…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Robot Manipulation and Learning
