The Potential of the Return Distribution for Exploration in RL
Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker

TL;DR
This paper explores how modeling return distributions can enhance exploration strategies in deterministic RL, demonstrating success on complex tasks with neural networks.
Contribution
It introduces methods leveraging return distributions for exploration in deterministic RL and reports successful results on previously unsolved tasks.
Findings
Effective exploration in deterministic RL using return distributions
Successful learning on a 100-step randomized Chain task
Analysis of network losses for various return distribution models
Abstract
This paper studies the potential of the return distribution for exploration in deterministic reinforcement learning (RL) environments. We study network losses and propagation mechanisms for Gaussian, Categorical and Gaussian mixture distributions. Combined with exploration policies that leverage this return distribution, we solve, for example, a randomized Chain task of length 100, which has not been reported before when learning with neural networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research
