The Potential of the Return Distribution for Exploration in RL

Thomas M. Moerland; Joost Broekens; Catholijn M. Jonker

arXiv:1806.04242·cs.LG·July 4, 2018·6 cites

The Potential of the Return Distribution for Exploration in RL

Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker

PDF

Open Access 1 Repo

TL;DR

This paper explores how modeling return distributions can enhance exploration strategies in deterministic RL, demonstrating success on complex tasks with neural networks.

Contribution

It introduces methods leveraging return distributions for exploration in deterministic RL and reports successful results on previously unsolved tasks.

Findings

01

Effective exploration in deterministic RL using return distributions

02

Successful learning on a 100-step randomized Chain task

03

Analysis of network losses for various return distribution models

Abstract

This paper studies the potential of the return distribution for exploration in deterministic reinforcement learning (RL) environments. We study network losses and propagation mechanisms for Gaussian, Categorical and Gaussian mixture distributions. Combined with exploration policies that leverage this return distribution, we solve, for example, a randomized Chain task of length 100, which has not been reported before when learning with neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tmoer/return_distribution_exploration
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research