Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration
Amirhossein Roknilamouki, Arnob Ghosh, Eylem Ekici, Ness B. Shroff

TL;DR
This paper introduces a vector-field reward shaping method for offline reinforcement learning that encourages safe, boundary-focused exploration without degeneracy, enabling agents to gather informative data while maintaining safety.
Contribution
The paper proposes a novel vector-field reward shaping paradigm that induces continuous, safe boundary exploration in offline RL, addressing the issue of degenerate parking behavior.
Findings
Agents successfully explore uncertainty boundaries in experiments.
The reward structure prevents degenerate solutions and promotes sustained exploration.
The method balances safe exploration with task performance.
Abstract
While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore and collect novel data online. Drawing inspiration from safe reinforcement learning, exploring near the boundary of regions well covered by the offline dataset and reliably modeled by the simulator allows an agent to take manageable risks--venturing into informative but moderate-uncertainty states while remaining close enough to familiar regions for safe recovery. However, naively rewarding this boundary-seeking behavior can lead to a degenerate parking behavior, where the agent simply stops once it reaches the frontier. To solve this, we propose a novel vector-field reward shaping paradigm designed to induce continuous, safe boundary exploration for non-adaptive deployed policies. Operating on an uncertainty oracle trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning
