Sparsity-based Safety Conservatism for Constrained Offline Reinforcement   Learning

Minjae Cho; Chuangchuang Sun

arXiv:2407.13006·cs.LG·July 19, 2024

Sparsity-based Safety Conservatism for Constrained Offline Reinforcement Learning

Minjae Cho, Chuangchuang Sun

PDF

Open Access

TL;DR

This paper introduces a sparsity-based safety conservatism approach for offline reinforcement learning, focusing on mitigating interpolation errors and enhancing safety in data-sparse, safety-critical environments.

Contribution

It proposes conservative metrics derived from data sparsity to improve safety and generalizability in offline RL without complex bi-level optimization.

Findings

01

Conservative metrics effectively identify high-risk regions in data-sparse areas.

02

The approach outperforms bi-level cost-ub-maximization in safety and simplicity.

03

Method demonstrates robustness across various offline RL tasks.

Abstract

Reinforcement Learning (RL) has made notable success in decision-making fields like autonomous driving and robotic manipulation. Yet, its reliance on real-time feedback poses challenges in costly or hazardous settings. Furthermore, RL's training approach, centered on "on-policy" sampling, doesn't fully capitalize on data. Hence, Offline RL has emerged as a compelling alternative, particularly in conducting additional experiments is impractical, and abundant datasets are available. However, the challenge of distributional shift (extrapolation), indicating the disparity between data distributions and learning policies, also poses a risk in offline RL, potentially leading to significant safety breaches due to estimation errors (interpolation). This concern is particularly pronounced in safety-critical domains, where real-world problems are prevalent. To address both extrapolation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOccupational Health and Safety Research · Software Reliability and Analysis Research · Risk and Safety Analysis