Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
Stefan O'Toole, Nir Lipovetzky, Miquel Ramirez, Adrian Pearce

TL;DR
This paper introduces new width-based planning and learning algorithms, notably N-CPL, which outperform previous methods on Atari-2600 games, especially those with large branching factors and sparse rewards.
Contribution
The paper presents novel width-based algorithms inspired by prior analysis, with N-CPL being the most effective, and provides a taxonomy of Atari games for better understanding.
Findings
N-CPL outperforms previous algorithms on Atari games.
Algorithms perform better on games with large branching factors.
Analysis offers insights into game characteristics affecting algorithm performance.
Abstract
We propose new width-based planning and learning algorithms inspired from a careful analysis of the design decisions made by previous width-based planners. The algorithms are applied over the Atari-2600 games and our best performing algorithm, Novelty guided Critical Path Learning (N-CPL), outperforms the previously introduced width-based planning and learning algorithms -IW(1), -IW(1)+ and -HIW(n, 1). Furthermore, we present a taxonomy of the Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, N-CPL outperforms -IW, -IW(1)+ and -HIW(n, 1).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Evolutionary Algorithms and Applications · Reinforcement Learning in Robotics
