Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration

Amirhossein Roknilamouki; Arnob Ghosh; Eylem Ekici; Ness B. Shroff

arXiv:2603.18326·cs.LG·March 20, 2026

Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration

Amirhossein Roknilamouki, Arnob Ghosh, Eylem Ekici, Ness B. Shroff

PDF

Open Access

TL;DR

This paper introduces a vector-field reward shaping method for offline reinforcement learning that encourages safe, boundary-focused exploration without degeneracy, enabling agents to gather informative data while maintaining safety.

Contribution

The paper proposes a novel vector-field reward shaping paradigm that induces continuous, safe boundary exploration in offline RL, addressing the issue of degenerate parking behavior.

Findings

01

Agents successfully explore uncertainty boundaries in experiments.

02

The reward structure prevents degenerate solutions and promotes sustained exploration.

03

The method balances safe exploration with task performance.

Abstract

While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore and collect novel data online. Drawing inspiration from safe reinforcement learning, exploring near the boundary of regions well covered by the offline dataset and reliably modeled by the simulator allows an agent to take manageable risks--venturing into informative but moderate-uncertainty states while remaining close enough to familiar regions for safe recovery. However, naively rewarding this boundary-seeking behavior can lead to a degenerate parking behavior, where the agent simply stops once it reaches the frontier. To solve this, we propose a novel vector-field reward shaping paradigm designed to induce continuous, safe boundary exploration for non-adaptive deployed policies. Operating on an uncertainty oracle trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning