Sparsity-based Safety Conservatism for Constrained Offline Reinforcement Learning
Minjae Cho, Chuangchuang Sun

TL;DR
This paper introduces a sparsity-based safety conservatism approach for offline reinforcement learning, focusing on mitigating interpolation errors and enhancing safety in data-sparse, safety-critical environments.
Contribution
It proposes conservative metrics derived from data sparsity to improve safety and generalizability in offline RL without complex bi-level optimization.
Findings
Conservative metrics effectively identify high-risk regions in data-sparse areas.
The approach outperforms bi-level cost-ub-maximization in safety and simplicity.
Method demonstrates robustness across various offline RL tasks.
Abstract
Reinforcement Learning (RL) has made notable success in decision-making fields like autonomous driving and robotic manipulation. Yet, its reliance on real-time feedback poses challenges in costly or hazardous settings. Furthermore, RL's training approach, centered on "on-policy" sampling, doesn't fully capitalize on data. Hence, Offline RL has emerged as a compelling alternative, particularly in conducting additional experiments is impractical, and abundant datasets are available. However, the challenge of distributional shift (extrapolation), indicating the disparity between data distributions and learning policies, also poses a risk in offline RL, potentially leading to significant safety breaches due to estimation errors (interpolation). This concern is particularly pronounced in safety-critical domains, where real-world problems are prevalent. To address both extrapolation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOccupational Health and Safety Research · Software Reliability and Analysis Research · Risk and Safety Analysis
