The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making
Luchen Li, Matthieu Komorowski, Aldo A. Faisal

TL;DR
This paper introduces the Actor Search Tree Critic (ASTC), a novel off-policy reinforcement learning method for partially observable Markov decision processes in healthcare, optimizing treatment policies from real ICU data.
Contribution
It proposes a new off-policy RL framework using belief state planning with heuristic tree search and Gaussian approximations for continuous policies in POMDPs, applied to medical decision making.
Findings
Effective in optimizing sepsis treatment policies from ICU data
Improves patient outcomes through better dosing strategies
Handles partial observability with belief state planning
Abstract
Off-policy reinforcement learning enables near-optimal policy from suboptimal experience, thereby provisions opportunity for artificial intelligence applications in healthcare. Previous works have mainly framed patient-clinician interactions as Markov decision processes, while true physiological states are not necessarily fully observable from clinical data. We capture this situation with partially observable Markov decision process, in which an agent optimises its actions in a belief represented as a distribution of patient states inferred from individual history trajectories. A Gaussian mixture model is fitted for the observed data. Moreover, we take into account the fact that nuance in pharmaceutical dosage could presumably result in significantly different effect by modelling a continuous policy through a Gaussian approximator directly in the policy space, i.e. the actor. To address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Sepsis Diagnosis and Treatment · Healthcare Technology and Patient Monitoring
