ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Chunyu Xuan, Yazhe Niu, Yuan Pu, Shuai Hu, Yu Liu, Jing Yang

TL;DR
ReZero introduces a backward-view reanalysis method for MCTS algorithms that reduces search time and enhances training speed without sacrificing sample efficiency, demonstrated across various game environments.
Contribution
The paper proposes ReZero, a novel reanalysis approach that reuses value estimates to cut down search costs in MCTS-based algorithms, improving training speed and efficiency.
Findings
Significantly reduces search cost in MCTS algorithms.
Maintains or improves sample efficiency during training.
Demonstrates effectiveness across multiple game environments.
Abstract
Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which uses the value estimation of a certain child node to save the corresponding sub-tree search time. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Neural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Residual Connection · ReZero · Monte-Carlo Tree Search · Residual Block · Convolution · Prioritized Experience Replay · Average Pooling
