TL;DR
This paper introduces KVDA-UCT, a novel abstraction algorithm for Monte Carlo Tree Search that groups states with known value differences, significantly improving sample efficiency and outperforming previous methods.
Contribution
The paper proposes KVDA-UCT, a new framework that relaxes the strict reward equality condition, enabling more abstractions and better performance in deterministic environments.
Findings
KVDA-UCT detects more abstractions than OGA-UCT.
KVDA-UCT outperforms OGA-UCT across various environments.
The method introduces no additional parameters.
Abstract
A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node statistics. On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT) is the state-of-the-art MCTS abstraction algorithm for deterministic environments that builds its abstraction using the Abstractions of State-Action Pairs (ASAP) framework, which aims to detect states and state-action pairs with the same value under optimal play by analysing the search graph. ASAP, however, requires two state-action pairs to have the same immediate reward, which is a rigid condition that limits the number of abstractions that can be found and thereby the sample efficiency. In this paper, we break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states…
Peer Reviews
Decision·ICLR 2026 Poster
- The problem tackled by this work is well motivated. Increasing the number of abstractions helps improve the sample efficiency of MCTS. - The solution, proposed in this work, sounds. The idea of abstraction using a known difference in values between state(-action) pairs is novel to me. - I really appreciate Section 2, which is essential to understand the preliminaries and know the related work. - I believe the results are strong and the proposed abstraction technique does not hurt in terms of p
- I believe some parts of the main paper need to be adjusted for clarity. - There are some tables and figures that are placed in the Appendix while being discussed as main results in the main paper. I understand the issue with space, but I believe it is really important for the main paper to be isolated from the appendix. - This is especially because the appendix is not well-presented, since there are some figures that are not inserted correctly. - I believe the baselines benchmarked in this wo
The idea is distinct and sound, supported with theoretical guarantees. Background description is elaborate and well grounded. Authors adopt adequate counterparts for comparison experiments, which are conducted on appropriate domain and various tasks. They also consider extending KVDA to stochastic environments for generality.
#### Major Weaknesses - KVDA–OGA Performance Ambiguity: The biggest concern is that according to Table 2, different from the authors’ comment, the performance of (ε_a, 0)-OGA with 1,000 iterations seems to generally outperform the KVDA. If it’s the case, authors might claim that since KVDA more generously abstracts than (ε_a, 0)-OGA, KVDA aggressively finds good paths more efficiently under the condition of low iteration budget. Here, at least I can agree that KVDA is better than (∞, 0)-OGA, si
Elegant and Sound Conceptual Contribution: The core insight relaxing value-equivalence to known value differences is a natural and well-motivated extension of ASAP. Figure 1 effectively demonstrates cases where ASAP fails but KVDA succeeds. The difference-accounted aggregation mechanism is theoretically sound, and the proof correctly establishes the exactness (lossless) guarantee. Strong Empirical Performance (Deterministic): The method is practical, as KVDA-UCT introduces no new parameters. It
Poor Stochastic Performance: The method's advantages largely disappear in the approximate (stochastic) setting. epsilon-t-KVDA exhibits mediocre performance, sometimes significantly underperforming. The authors acknowledge this stems from "faulty abstractions" but offer it only as future work, substantially limiting the method's practical applicability. Experimental Design Lacks Critical Controls: The use of hand-engineered heuristics to create dense rewards for board games may artificially fav
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
