Classical and Deep Reinforcement Learning Inventory Control Policies for Pharmaceutical Supply Chains with Perishability and Non-Stationarity
Francesco Stranieri, Chaaben Kouki, Willem van Jaarsveld, Fabio Stella

TL;DR
This paper compares classical and deep reinforcement learning inventory policies for pharmaceutical supply chains with perishability and demand variability, highlighting their strengths, limitations, and the need for integrated approaches.
Contribution
It introduces a realistic case study, benchmarks multiple policies including DRL with PPO, and proposes methods for optimizing and estimating inventory policies under complex conditions.
Findings
PIL policy shows robust and consistent performance.
All policies achieve lower costs than human-driven policies but with higher variability.
DRL with PPO performs well in complex scenarios but is computationally intensive.
Abstract
We study inventory control policies for pharmaceutical supply chains, addressing challenges such as perishability, yield uncertainty, and non-stationary demand, combined with batching constraints, lead times, and lost sales. Collaborating with Bristol-Myers Squibb (BMS), we develop a realistic case study incorporating these factors and benchmark three policies--order-up-to (OUT), projected inventory level (PIL), and deep reinforcement learning (DRL) using the proximal policy optimization (PPO) algorithm--against a BMS baseline based on human expertise. We derive and validate bounds-based procedures for optimizing OUT and PIL policy parameters and propose a methodology for estimating projected inventory levels, which are also integrated into the DRL policy with demand forecasts to improve decision-making under non-stationarity. Compared to a human-driven policy, which avoids lost sales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Fault Detection and Control Systems
MethodsEntropy Regularization · Proximal Policy Optimization
