Is Policy Learning Overrated?: Width-Based Planning and Active Learning for Atari
Benjamin Ayton, Masataro Asai

TL;DR
This paper introduces Olive, an online active learning method for width-based planning in Atari games, which updates feature representations during planning to improve performance without policy learning.
Contribution
Olive is the first approach to update VAE features online using active learning during planning, significantly improving Atari game performance without policy training.
Findings
Olive outperforms Rollout-IW and VAE-IW in 55 Atari games.
Olive surpasses policy-learning methods like $ ext{π}$-IW and DQN with less training.
Olive achieves state-of-the-art data efficiency in Atari 100k benchmark.
Abstract
Width-based planning has shown promising results on Atari 2600 games using pixel input, while using substantially fewer environment interactions than reinforcement learning. Recent width-based approaches have computed feature vectors for each screen using a hand designed feature set or a variational autoencoder trained on game screens (VAE-IW), and prune screens that do not have novel features during the search. We propose Olive (Online-VAE-IW), which updates the VAE features online using active learning to maximize the utility of screens observed during planning. Experimental results in 55 Atari games demonstrate that it outperforms Rollout-IW by 42-to-11 and VAE-IW by 32-to-20. Moreover, Olive outperforms existing work based on policy-learning (-IW, DQN) trained with 100x training budget by 30-to-22 and 31-to-17, and a state of the art data-efficient reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games
