Distributional Reinforcement Learning for Efficient Exploration
Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen, Wu, Yaoliang Yu

TL;DR
This paper introduces a novel exploration method for deep reinforcement learning using distributional estimates, which improves performance in Atari games and accelerates learning in a 3D driving simulator.
Contribution
The paper presents a new exploration technique combining a decaying intrinsic uncertainty schedule with upper quantile bonuses in distributional RL.
Findings
Outperforms QR-DQN in 12 of 14 Atari games
Achieves 483% average gain in cumulative rewards
Faster learning in the CARLA driving simulator
Abstract
In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks · Reservoir Engineering and Simulation Methods · Advanced Bandit Algorithms Research
