Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization
Sebastian Griesbach, Carlo D'Eramo

TL;DR
This paper introduces SEE, a robust exploration method for deep reinforcement learning that maximizes TD-error to effectively handle diverse reward settings, including those that discourage exploration, without requiring hyperparameter tuning.
Contribution
The paper proposes a novel exploration technique, SEE, which maximizes TD-error with specific design choices to ensure stability and robustness across various reward environments.
Findings
SEE performs well across dense, sparse, and adverse reward settings.
The method can be integrated with off-policy algorithms without altering their optimization pipeline.
Experimental results show improved robustness and performance of Soft-Actor Critic with SEE.
Abstract
Numerous heuristics and advanced approaches have been proposed for exploration in different settings for deep reinforcement learning. Noise-based exploration generally fares well with dense-shaped rewards and bonus-based exploration with sparse rewards. However, these methods usually require additional tuning to deal with undesirable reward settings by adjusting hyperparameters and noise distributions. Rewards that actively discourage exploration, i.e., with an action cost and no other dense signal to follow, can pose a major challenge. We propose a novel exploration method, Stable Error-seeking Exploration (SEE), that is robust across dense, sparse, and exploration-adverse reward settings. To this endeavor, we revisit the idea of maximizing the TD-error as a separate objective. Our method introduces three design choices to mitigate instability caused by far-off-policy learning, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
