DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
Leshem Choshen, Lior Fox, Yonatan Loewenstein

TL;DR
This paper introduces E-values, a novel model-free method for directed exploration in reinforcement learning that improves learning efficiency and performance, especially in continuous environments like Atari games.
Contribution
The paper proposes E-values as a generalization of counters for directed exploration, addressing their locality issue and enabling efficient learning in continuous MDPs.
Findings
E-values outperform traditional counters in exploration tasks.
E-values improve learning speed and performance in Atari 2600 games.
Method can be integrated with function approximation for continuous environments.
Abstract
Exploration is a fundamental aspect of Reinforcement Learning, typically implemented using stochastic action-selection. Exploration, however, can be more efficient if directed toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality. While there are a few model-based solutions to this shortcoming, a model-free approach is still missing. We propose -values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories. We compare our approach to commonly used RL techniques, and show that using -values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efficiently learn continuous MDPs. We demonstrate this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpen Source Software Innovations
