Discovering State Equivalences in UCT Search Trees By Action Pruning

Robin Schm\"ocker; Alexander Dockhorn; Bodo Rosenhahn

arXiv:2510.26346·cs.AI·October 31, 2025

Discovering State Equivalences in UCT Search Trees By Action Pruning

Robin Schm\"ocker, Alexander Dockhorn, Bodo Rosenhahn

PDF

4 Reviews

TL;DR

This paper introduces IPA-UCT, a novel state abstraction method for UCT search trees that improves sample efficiency by discovering more state equivalences, outperforming existing methods across various domains.

Contribution

It proposes a weaker state abstraction condition called IPA, enabling more abstractions in noisy or large action spaces, and unifies existing frameworks under p-ASAP and ASASAP.

Findings

01

IPA-UCT outperforms OGA-UCT in multiple test domains

02

The weaker abstraction condition increases the number of discoverable abstractions

03

Both IPA and ASAP are special cases of a more general framework

Abstract

One approach to enhance Monte Carlo Tree Search (MCTS) is to improve its sample efficiency by grouping/abstracting states or state-action pairs and sharing statistics within a group. Though state-action pair abstractions are mostly easy to find in algorithms such as On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT), nearly no state abstractions are found in either noisy or large action space settings due to constraining conditions. We provide theoretical and empirical evidence for this claim, and we slightly alleviate this state abstraction problem by proposing a weaker state abstraction condition that trades a minor loss in accuracy for finding many more abstractions. We name this technique Ideal Pruning Abstractions in UCT (IPA-UCT), which outperforms OGA-UCT (and any of its derivatives) across a large range of test domains and iteration budgets as…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

- The motivation of the proposed framework is intuitive and understandable. The paper also provides a principled extension of the abstraction hierarchy which is reasonable. - The authors demonstrate why prior methods struggle to discover meaningful state abstractions and how their framework mitigates this limitation. - Experiments show the effectiveness of the proposed method. The authors also include ablations on a few hyperparameters. Finally, the paper is well written and easy to follow.

Weaknesses

- I think it is good to include more intuition or visualization on how the discovered abstractions influence search behavior in more complex domains. - It is questionable that the proposed framework could be extended to more complex high-dimensional environments. Could the authors include discussions on this?

Reviewer 02Rating 4Confidence 3

Strengths

- **Clear problem diagnosis**: The combinatorial bound (Equation 7) and toy example effectively demonstrate why OGA finds almost no state abstractions. Table 2 in the appendix backs this up empirically. - **Simple, practical solution**: The UCB-based pruning (Equation 10) is easy to implement and aligns naturally with how UCT allocates samples. - **Thorough empirical evaluation**: Testing on many domains with multiple budgets, proper confidence intervals (99%), and generalization-focused metri

Weaknesses

**Major issues:** 1. **Missing finite-sample analysis**: The soundness guarantee is only asymptotic. There's no analysis of how often J_UCB prunes optimal actions at realistic visit counts (100-1000). This is critical since the method specifically targets low-sample regimes. 2. **Modest empirical gains**: While aggregate improvements are consistent, per-environment parameter-optimized results (Figures 6-7, Appendix A.2) often show ties or minimal improvements. The authors acknowledge gains are

Reviewer 03Rating 6Confidence 4

Strengths

The problem statement is clearly described. The proposed method is simple and directly complement the baseline (ASAP) and is theoretically well established. Empirical results show constant improvements compared to the baseline. Limitations of IPA are well discussed.

Weaknesses

### Major Weaknesses - Writing and Readability Issues: The main idea of this paper is novel and sound, but the most degrading part is the writing. Once understood, the theoretical and empirical results are convincing, but reaching that level of understanding required several careful readings. Some sentences are overly lengthy or colloquial, making the paper unnecessarily exhausting to read. - Delayed Introduction of the Main Method (IPA): The exact mechanism of the proposed method, IPA, does n

Reviewer 04Rating 0Confidence 4

Strengths

- The evaluation design was done well. The authors ran their experiments on a wide array of problem domains and with a suitably high number of random seeds to demonstrate statistical significance. - The motivation for the paper is strong. State abstraction remains a challenging problem in planning and methods to do so are of great interest to the community. - Although, I only skimmed the proof, the claim of why ASAP finds few abstractions seems to supported by their theory.

Weaknesses

- The main weakness with the paper is its lack of significance. The method the authors propose, pruning actions with UCB values less than $Q_{max}$, seems to effectively boil down to considering only the $k$ actions with the highest UCB values for search and abstraction for some arbitrary value of $k$. This minor change is not enough to be considered a significant contribution. - If we consider the argument they show for why ASAP fails, the authors do not clearly explain how IPA-UCT explicitly o

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.