FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent   Safety in Offline Reinforcement Learning

Prajwal Koirala; Zhanhong Jiang; Soumik Sarkar; Cody Fleming

arXiv:2412.08880·cs.LG·December 13, 2024

FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning

Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming

PDF

Open Access

TL;DR

FAWAC introduces a novel offline RL method that ensures persistent safety by balancing safety constraints with performance, using feasibility-informed advantage weighting and policy projection techniques.

Contribution

The paper proposes FAWAC, a new offline RL approach that incorporates feasibility conditions and advantage weighting to improve safety and performance in constrained environments.

Findings

01

FAWAC outperforms existing methods on standard benchmarks.

02

It effectively balances safety constraints with reward maximization.

03

The approach handles high-reward but unsafe datasets successfully.

Abstract

Safe offline reinforcement learning aims to learn policies that maximize cumulative rewards while adhering to safety constraints, using only offline data for training. A key challenge is balancing safety and performance, particularly when the policy encounters out-of-distribution (OOD) states and actions, which can lead to safety violations or overly conservative behavior during deployment. To address these challenges, we introduce Feasibility Informed Advantage Weighted Actor-Critic (FAWAC), a method that prioritizes persistent safety in constrained Markov decision processes (CMDPs). FAWAC formulates policy optimization with feasibility conditions derived specifically for offline datasets, enabling safe policy updates in non-parametric policy space, followed by projection into parametric space for constrained actor training. By incorporating a cost-advantage term into Advantage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications