Planning and Learning Using Adaptive Entropy Tree Search
Piotr Kozakowski, Miko{\l}aj Pacek, Piotr Mi{\l}o\'s

TL;DR
The paper introduces Adaptive Entropy Tree Search (ANTS), a new planning and learning algorithm that outperforms existing methods like PUCT in Atari benchmarks, demonstrating high performance and robustness.
Contribution
ANTS is a novel algorithm that combines maximum entropy planning with learning, overcoming previous limitations and achieving state-of-the-art results.
Findings
ANTS outperforms PUCT in Atari benchmarks
ANTS shows robustness to hyperparameter variations
ANTS reaches state-of-the-art performance in planning and learning
Abstract
Recent breakthroughs in Artificial Intelligence have shown that the combination of tree-based planning with deep learning can lead to superior performance. We present Adaptive Entropy Tree Search (ANTS) - a novel algorithm combining planning and learning in the maximum entropy paradigm. Through a comprehensive suite of experiments on the Atari benchmark we show that ANTS significantly outperforms PUCT, the planning component of the state-of-the-art AlphaZero system. ANTS builds upon recent work on maximum entropy planning methods - which however, as we show, fail in combination with learning. ANTS resolves this issue to reach state-of-the-art performance. We further find that ANTS exhibits superior robustness to different hyperparameter choices, compared to the previous algorithms. We believe that the high performance and robustness of ANTS can bring tree search planning one step closer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Metaheuristic Optimization Algorithms Research · Reinforcement Learning in Robotics
MethodsSoftmax · AlphaZero
