#Exploration: A Study of Count-Based Exploration for Deep Reinforcement   Learning

Haoran Tang; Rein Houthooft; Davis Foote; Adam Stooke; Xi Chen; Yan; Duan; John Schulman; Filip De Turck; Pieter Abbeel

arXiv:1611.04717·cs.AI·December 6, 2017·344 cites

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan, Duan, John Schulman, Filip De Turck, Pieter Abbeel

PDF

Open Access 3 Repos

TL;DR

This paper demonstrates that a simple count-based exploration method using hash codes can achieve near state-of-the-art results in high-dimensional and continuous deep reinforcement learning tasks, challenging the belief that such methods are limited to small state spaces.

Contribution

The authors introduce a straightforward extension of count-based exploration using hash codes, which performs well on complex high-dimensional and continuous RL benchmarks, and analyze key properties of effective hash functions.

Findings

01

Hash-based exploration achieves near state-of-the-art results on deep RL benchmarks.

02

Simple hash functions can effectively guide exploration in high-dimensional spaces.

03

Domain-dependent learned hash codes can further enhance exploration performance.

Abstract

Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Autonomous Vehicle Technology and Safety