Deterministic Exploration via Stationary Bellman Error Maximization

Sebastian Griesbach; Carlo D'Eramo

arXiv:2410.23840·cs.LG·November 6, 2024

Deterministic Exploration via Stationary Bellman Error Maximization

Sebastian Griesbach, Carlo D'Eramo

PDF

Open Access

TL;DR

This paper introduces a deterministic exploration method in reinforcement learning that stabilizes Bellman error maximization, enabling more effective exploration compared to traditional stochastic methods.

Contribution

The authors propose three modifications to Bellman error maximization to create a stable, deterministic exploration policy that leverages past experiences.

Findings

01

Outperforms ε-greedy in dense reward environments

02

Effective in sparse reward settings

03

Mitigates instability from off-policy learning

Abstract

Exploration is a crucial and distinctive aspect of reinforcement learning (RL) that remains a fundamental open problem. Several methods have been proposed to tackle this challenge. Commonly used methods inject random noise directly into the actions, indirectly via entropy maximization, or add intrinsic rewards that encourage the agent to steer to novel regions of the state space. Another previously seen idea is to use the Bellman error as a separate optimization objective for exploration. In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy. Our separate exploration agent is informed about the state of the exploitation, thus enabling it to account for previous experiences. Further components are introduced to make the exploration objective agnostic toward the episode length and to mitigate instability introduced by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTarget Tracking and Data Fusion in Sensor Networks · Advanced Control Systems Optimization · Control Systems and Identification