Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin; Shangtong Zhang; Hengshuai Yao; Linglong Kong; Kaiwen; Wu; Yaoliang Yu

arXiv:1905.06125·cs.LG·May 16, 2019·30 cites

Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen, Wu, Yaoliang Yu

PDF

Open Access

TL;DR

This paper introduces a novel exploration method for deep reinforcement learning using distributional estimates, which improves performance in Atari games and accelerates learning in a 3D driving simulator.

Contribution

The paper presents a new exploration technique combining a decaying intrinsic uncertainty schedule with upper quantile bonuses in distributional RL.

Findings

01

Outperforms QR-DQN in 12 of 14 Atari games

02

Achieves 483% average gain in cumulative rewards

03

Faster learning in the CARLA driving simulator

Abstract

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTarget Tracking and Data Fusion in Sensor Networks · Reservoir Engineering and Simulation Methods · Advanced Bandit Algorithms Research