Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies

Runze Yan; Xun Shen; Akifumi Wachi; Sebastien Gros; Anni Zhao; Xiao Hu

arXiv:2505.16242·cs.LG·May 23, 2025

Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies

Runze Yan, Xun Shen, Akifumi Wachi, Sebastien Gros, Anni Zhao, Xiao Hu

PDF

Open Access 1 Video

TL;DR

This paper introduces OGSRL, a model-based offline reinforcement learning framework for healthcare that ensures safe, reliable policy improvement by constraining exploration within clinically validated regions and safety boundaries, with theoretical guarantees.

Contribution

OGSRL is the first to combine dual constraints for safe, reliable policy improvement in offline healthcare RL, addressing OOD issues and incorporating domain-specific safety knowledge.

Findings

01

OGSRL outperforms existing methods in safe policy improvement.

02

Theoretical guarantees ensure policies remain in safe, supported regions.

03

OGSRL effectively leverages full patient state history for better treatment strategies.

Abstract

When applying offline reinforcement learning (RL) in healthcare scenarios, the out-of-distribution (OOD) issues pose significant risks, as inappropriate generalization beyond clinical expertise can result in potentially harmful recommendations. While existing methods like conservative Q-learning (CQL) attempt to address the OOD issue, their effectiveness is limited by only constraining action selection by suppressing uncertain actions. This action-only regularization imitates clinician actions that prioritize short-term rewards, but it fails to regulate downstream state trajectories, thereby limiting the discovery of improved long-term treatment strategies. To safely improve policy beyond clinician recommendations while ensuring that state-action trajectories remain in-distribution, we propose \textit{Offline Guarded Safe Reinforcement Learning} ( $OGSRL$ ), a theoretically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies· slideslive

Taxonomy

TopicsPharmaceutical Economics and Policy

MethodsQ-Learning