The hidden risks of temporal resampling in clinical reinforcement learning
Thomas Frost, Hrisheekesh Vaidya, Steve Harris

TL;DR
This study reveals that temporal resampling of clinical data in offline reinforcement learning significantly impairs model performance, highlighting the need for datasets with natural decision timings for safe deployment.
Contribution
The paper demonstrates the detrimental impact of data binning on RL model performance in healthcare and emphasizes the importance of using naturally timed clinical data.
Findings
Resampling data at 4-hour intervals worsened model performance by up to 60%.
Retrospective evaluation overestimated model returns by 1.5-3 times.
Natural clinical timings are crucial for reliable RL model deployment.
Abstract
Reinforcement learning (RL) is a type of artificial intelligence for making optimal choices. In healthcare, researchers generally use offline RL (ORL), where models are trained and evaluated from retrospective observational data. To accommodate inherently irregular clinical records, researchers often resample the data into uniform time intervals before training (known as binning). However, discretised data presents the model with a fictional representation of clinical scenarios, especially where unpredictable decision timings are common. As these models lack robust trial evidence, we chose to explore the effects of this further by conducting an in silico clinical trial using 30 virtual patients with type 1 diabetes from the FDA-approved UVA/Padova simulator. The simulator was modified to include stochastic intervals between decisions and used to generate a training dataset for offline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
