AUPO -- Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm
Robin Schm\"ocker, Alexander Dockhorn, Bodo Rosenhahn

TL;DR
AUPO is a new reward distribution-based abstraction algorithm for MCTS that improves decision-making by automatically detecting symmetric actions without needing transition probabilities or a DAG, outperforming existing methods.
Contribution
Introduces AUPO, an automatic, reward distribution-based abstraction method for MCTS that requires no transition probabilities or DAGs, enhancing symmetry detection and compatibility with other techniques.
Findings
AUPO outperforms standard MCTS on IPPC benchmarks.
AUPO effectively detects symmetric actions even when states are far apart.
AUPO can be combined with other abstraction methods without conflicts.
Abstract
We introduce a novel, drop-in modification to Monte Carlo Tree Search's (MCTS) decision policy that we call AUPO. Comparisons based on a range of IPPC benchmark problems show that AUPO clearly outperforms MCTS. AUPO is an automatic action abstraction algorithm that solely relies on reward distribution statistics acquired during the MCTS. Thus, unlike other automatic abstraction algorithms, AUPO requires neither access to transition probabilities nor does AUPO require a directed acyclic search graph to build its abstraction, allowing AUPO to detect symmetric actions that state-of-the-art frameworks like ASAP struggle with when the resulting symmetric states are far apart in state space. Furthermore, as AUPO only affects the decision policy, it is not mutually exclusive with other abstraction techniques that only affect the tree search.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
