TL;DR
This paper introduces IPA-UCT, a novel state abstraction method for UCT search trees that improves sample efficiency by discovering more state equivalences, outperforming existing methods across various domains.
Contribution
It proposes a weaker state abstraction condition called IPA, enabling more abstractions in noisy or large action spaces, and unifies existing frameworks under p-ASAP and ASASAP.
Findings
IPA-UCT outperforms OGA-UCT in multiple test domains
The weaker abstraction condition increases the number of discoverable abstractions
Both IPA and ASAP are special cases of a more general framework
Abstract
One approach to enhance Monte Carlo Tree Search (MCTS) is to improve its sample efficiency by grouping/abstracting states or state-action pairs and sharing statistics within a group. Though state-action pair abstractions are mostly easy to find in algorithms such as On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT), nearly no state abstractions are found in either noisy or large action space settings due to constraining conditions. We provide theoretical and empirical evidence for this claim, and we slightly alleviate this state abstraction problem by proposing a weaker state abstraction condition that trades a minor loss in accuracy for finding many more abstractions. We name this technique Ideal Pruning Abstractions in UCT (IPA-UCT), which outperforms OGA-UCT (and any of its derivatives) across a large range of test domains and iteration budgets as…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The motivation of the proposed framework is intuitive and understandable. The paper also provides a principled extension of the abstraction hierarchy which is reasonable. - The authors demonstrate why prior methods struggle to discover meaningful state abstractions and how their framework mitigates this limitation. - Experiments show the effectiveness of the proposed method. The authors also include ablations on a few hyperparameters. Finally, the paper is well written and easy to follow.
- I think it is good to include more intuition or visualization on how the discovered abstractions influence search behavior in more complex domains. - It is questionable that the proposed framework could be extended to more complex high-dimensional environments. Could the authors include discussions on this?
- **Clear problem diagnosis**: The combinatorial bound (Equation 7) and toy example effectively demonstrate why OGA finds almost no state abstractions. Table 2 in the appendix backs this up empirically. - **Simple, practical solution**: The UCB-based pruning (Equation 10) is easy to implement and aligns naturally with how UCT allocates samples. - **Thorough empirical evaluation**: Testing on many domains with multiple budgets, proper confidence intervals (99%), and generalization-focused metri
**Major issues:** 1. **Missing finite-sample analysis**: The soundness guarantee is only asymptotic. There's no analysis of how often J_UCB prunes optimal actions at realistic visit counts (100-1000). This is critical since the method specifically targets low-sample regimes. 2. **Modest empirical gains**: While aggregate improvements are consistent, per-environment parameter-optimized results (Figures 6-7, Appendix A.2) often show ties or minimal improvements. The authors acknowledge gains are
The problem statement is clearly described. The proposed method is simple and directly complement the baseline (ASAP) and is theoretically well established. Empirical results show constant improvements compared to the baseline. Limitations of IPA are well discussed.
### Major Weaknesses - Writing and Readability Issues: The main idea of this paper is novel and sound, but the most degrading part is the writing. Once understood, the theoretical and empirical results are convincing, but reaching that level of understanding required several careful readings. Some sentences are overly lengthy or colloquial, making the paper unnecessarily exhausting to read. - Delayed Introduction of the Main Method (IPA): The exact mechanism of the proposed method, IPA, does n
- The evaluation design was done well. The authors ran their experiments on a wide array of problem domains and with a suitably high number of random seeds to demonstrate statistical significance. - The motivation for the paper is strong. State abstraction remains a challenging problem in planning and methods to do so are of great interest to the community. - Although, I only skimmed the proof, the claim of why ASAP finds few abstractions seems to supported by their theory.
- The main weakness with the paper is its lack of significance. The method the authors propose, pruning actions with UCB values less than $Q_{max}$, seems to effectively boil down to considering only the $k$ actions with the highest UCB values for search and abstraction for some arbitrary value of $k$. This minor change is not enough to be considered a significant contribution. - If we consider the argument they show for why ASAP fails, the authors do not clearly explain how IPA-UCT explicitly o
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
