On Bonus-Based Exploration Methods in the Arcade Learning Environment

Adrien Ali Ta\"iga; William Fedus; Marlos C. Machado; Aaron Courville; and Marc G. Bellemare

arXiv:2109.11052·cs.LG·September 24, 2021·21 cites

On Bonus-Based Exploration Methods in the Arcade Learning Environment

Adrien Ali Ta\"iga, William Fedus, Marlos C. Machado, Aaron Courville, and Marc G. Bellemare

PDF

Open Access

TL;DR

This paper reevaluates bonus-based exploration methods in Atari 2600 games, finding they do not outperform simpler strategies like epsilon-greedy on easy games and do not benefit from more training samples on hard exploration games, suggesting architecture improvements may drive recent successes.

Contribution

The study provides a comprehensive evaluation of bonus-based exploration methods within a unified framework, challenging assumptions about their effectiveness across different Atari games.

Findings

01

Bonus-based methods improve scores on Montezuma's Revenge but not over epsilon-greedy on easier games.

02

No significant gains from additional training samples on hard exploration games.

03

Recent improvements in Montezuma's Revenge likely stem from architecture changes, not exploration strategies.

Abstract

Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler $ϵ$ -greedy scheme. In fact, we find that methods that perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Digital Games and Media