Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed
Minjae Kwon, Josephine Lamp, Lu Feng

TL;DR
This paper examines whether safety guarantees in reinforcement learning transfer to new, unseen patient populations in diabetes management, revealing a safety gap and proposing shielding techniques to improve safety during deployment.
Contribution
It introduces a benchmark and simulator for studying safety generalization in safe RL under distribution shift, and demonstrates the effectiveness of test-time shielding in restoring safety.
Findings
Safety gap observed across algorithms and patient groups.
Shielding improves Time-in-Range by 13-14%.
Reduces clinical risk index and glucose variability.
Abstract
Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively restores safety across algorithms and patient populations. Across eight safe RL algorithms, three diabetes types, and three age groups, shielding achieves Time-in-Range gains of 13--14\% for strong baselines such as PPO-Lag and CPO while reducing clinical risk index and glucose variability. Our simulator and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-cpomodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-cupmodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-focopsmodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-oncrpomodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-pcpomodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-ppolagmodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-rcpomodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adolescent-trpolagmodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adult-cpomodel
- 🤗safe-diabetes-benchmark/safe-diabetes-t1d-adult-cupmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Diabetes Management and Research
