TL;DR
This paper explores alternative intra-abstraction policies for non-exact abstraction algorithms in Monte Carlo Tree Search, demonstrating that some outperform the standard random tiebreak policy in various environments.
Contribution
It introduces and empirically evaluates new intra-abstraction policies for MCTS, addressing limitations of existing random tiebreak methods.
Findings
Several proposed policies outperform random tiebreak in most environments
Enhanced policies improve sample efficiency of MCTS
Empirical results validate the effectiveness of alternative intra-abstraction strategies
Abstract
One weakness of Monte Carlo Tree Search (MCTS) is its sample efficiency which can be addressed by building and using state and/or action abstractions in parallel to the tree search such that information can be shared among nodes of the same layer. The primary usage of abstractions for MCTS is to enhance the Upper Confidence Bound (UCB) value during the tree policy by aggregating visits and returns of an abstract node. However, this direct usage of abstractions does not take the case into account where multiple actions with the same parent might be in the same abstract node, as these would then all have the same UCB value, thus requiring a tiebreak rule. In state-of-the-art abstraction algorithms such as pruned On the Go Abstractions (pruned OGA), this case has not been noticed, and a random tiebreak rule was implicitly chosen. In this paper, we propose and empirically evaluate several…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The choice of evaluation domains and the number of seeds over which experiments were run seem appropriate.
- The paper has negligible significance. - The first purported contribution is a framework generalizing the formalism of abstractions used for MCTS. This is not a novel contribution. Their frameworks appears to be simply restating bisimilarity and bisumlation metrics. Furthermore, they do not actually use their framework for anything --- they describe it but then don't show how it's useful, or what insights can be gleaned using it, nor do they use it for the rest of paper. - The second h
- This paper identifies a previously overlooked issue in state-of-the-art abstraction algorithms for MCTS. It successfully argues that the implicit use of a random tiebreak rule is a significant weakness. - This paper introduces the ASASAP framework to unify different MCTS-based abstraction algorithms. - The proposed policy, UCT, is a "parameter-free drop-in improvement" to replace the random policy used in the OGA. It is a practical, easy-to-implement approach for researchers in this area.
- The paper introduces the ASASAP framework and then proposes different polices. But it is not clear how to get those police from the framework, or how the ASASAP framework provides new insights into the intra-abstraction policy problem. - In the experimental setting, some parameters are fixed. There is no discussion about how those parameters affect the results. It would be beneficial to show results with at least one alternative.
- This paper addresses a key problem in making efficient MCTS iterations, and the ideas behind state-action abstraction are sound. - There are a wealth of environments (academic advising, Manufacturer etc.) in which the MCTS variants are subject to.
- The paper is quite difficult to parse, and the paper over requires a large amount of revision in order for its ideas to be clearly communicated. Based on this - I perhaps did not understand the paper that well. For example, for readers not super familiar with previous work in state-action abstraction, a short formal presentation of ASAP, ASASAP, and what the author's contribution is, would be nice to have - and not too complex to communicate. - Lack of theoretical depth. The paper contains v
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
