Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies
Runze Yan, Xun Shen, Akifumi Wachi, Sebastien Gros, Anni Zhao, Xiao Hu

TL;DR
This paper introduces OGSRL, a model-based offline reinforcement learning framework for healthcare that ensures safe, reliable policy improvement by constraining exploration within clinically validated regions and safety boundaries, with theoretical guarantees.
Contribution
OGSRL is the first to combine dual constraints for safe, reliable policy improvement in offline healthcare RL, addressing OOD issues and incorporating domain-specific safety knowledge.
Findings
OGSRL outperforms existing methods in safe policy improvement.
Theoretical guarantees ensure policies remain in safe, supported regions.
OGSRL effectively leverages full patient state history for better treatment strategies.
Abstract
When applying offline reinforcement learning (RL) in healthcare scenarios, the out-of-distribution (OOD) issues pose significant risks, as inappropriate generalization beyond clinical expertise can result in potentially harmful recommendations. While existing methods like conservative Q-learning (CQL) attempt to address the OOD issue, their effectiveness is limited by only constraining action selection by suppressing uncertain actions. This action-only regularization imitates clinician actions that prioritize short-term rewards, but it fails to regulate downstream state trajectories, thereby limiting the discovery of improved long-term treatment strategies. To safely improve policy beyond clinician recommendations while ensuring that state-action trajectories remain in-distribution, we propose \textit{Offline Guarded Safe Reinforcement Learning} (), a theoretically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPharmaceutical Economics and Policy
MethodsQ-Learning
