Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization

Sebastian Griesbach; Carlo D'Eramo

arXiv:2506.13345·cs.LG·October 22, 2025

Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization

Sebastian Griesbach, Carlo D'Eramo

PDF

Open Access 1 Repo

TL;DR

This paper introduces SEE, a robust exploration method for deep reinforcement learning that maximizes TD-error to effectively handle diverse reward settings, including those that discourage exploration, without requiring hyperparameter tuning.

Contribution

The paper proposes a novel exploration technique, SEE, which maximizes TD-error with specific design choices to ensure stability and robustness across various reward environments.

Findings

01

SEE performs well across dense, sparse, and adverse reward settings.

02

The method can be integrated with off-policy algorithms without altering their optimization pipeline.

03

Experimental results show improved robustness and performance of Soft-Actor Critic with SEE.

Abstract

Numerous heuristics and advanced approaches have been proposed for exploration in different settings for deep reinforcement learning. Noise-based exploration generally fares well with dense-shaped rewards and bonus-based exploration with sparse rewards. However, these methods usually require additional tuning to deal with undesirable reward settings by adjusting hyperparameters and noise distributions. Rewards that actively discourage exploration, i.e., with an action cost and no other dense signal to follow, can pose a major challenge. We propose a novel exploration method, Stable Error-seeking Exploration (SEE), that is robust across dense, sparse, and exploration-adverse reward settings. To this endeavor, we revisit the idea of maximizing the TD-error as a separate objective. Our method introduces three design choices to mitigate instability caused by far-off-policy learning, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sebastian-griesbach/see
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning