COOL-MC: Verifying and Explaining RL Policies for Platelet Inventory Management
Dennis Gross

TL;DR
This paper applies COOL-MC to verify and explain a reinforcement learning policy for platelet inventory management, ensuring safety and transparency in a critical healthcare supply chain context.
Contribution
It introduces the first formal verification and explanation of an RL-based platelet inventory policy using probabilistic model checking and feature analysis.
Findings
Policy achieves 2.9% stockout probability within 200 steps
Policy mainly considers inventory age over other features
Diverse replenishment strategies are employed by the policy
Abstract
Platelets expire within five days. Blood banks face uncertain daily demand and must balance ordering decisions between costly wastage from overstocking and life-threatening shortages from understocking. Reinforcement learning (RL) can learn effective ordering policies for this Markov decision process (MDP), but the resulting neural policies remain black boxes, hindering trust and adoption in safety-critical domains. We apply COOL-MC, a tool that combines RL with probabilistic model checking and explainable RL, to verify and explain a trained policy for the MDP on platelet inventory management inspired by Haijema et al. By constructing a policy-induced discrete-time Markov chain (which includes only the reachable states under the trained policy to reduce memory usage), we verify PCTL properties and provide feature-level explanations. Results show that the trained policy achieves a 2.9%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlood donation and transfusion practices · Platelet Disorders and Treatments · Forecasting Techniques and Applications
