Classical and Deep Reinforcement Learning Inventory Control Policies for   Pharmaceutical Supply Chains with Perishability and Non-Stationarity

Francesco Stranieri; Chaaben Kouki; Willem van Jaarsveld; Fabio Stella

arXiv:2501.10895·cs.AI·January 22, 2025

Classical and Deep Reinforcement Learning Inventory Control Policies for Pharmaceutical Supply Chains with Perishability and Non-Stationarity

Francesco Stranieri, Chaaben Kouki, Willem van Jaarsveld, Fabio Stella

PDF

Open Access

TL;DR

This paper compares classical and deep reinforcement learning inventory policies for pharmaceutical supply chains with perishability and demand variability, highlighting their strengths, limitations, and the need for integrated approaches.

Contribution

It introduces a realistic case study, benchmarks multiple policies including DRL with PPO, and proposes methods for optimizing and estimating inventory policies under complex conditions.

Findings

01

PIL policy shows robust and consistent performance.

02

All policies achieve lower costs than human-driven policies but with higher variability.

03

DRL with PPO performs well in complex scenarios but is computationally intensive.

Abstract

We study inventory control policies for pharmaceutical supply chains, addressing challenges such as perishability, yield uncertainty, and non-stationary demand, combined with batching constraints, lead times, and lost sales. Collaborating with Bristol-Myers Squibb (BMS), we develop a realistic case study incorporating these factors and benchmark three policies--order-up-to (OUT), projected inventory level (PIL), and deep reinforcement learning (DRL) using the proximal policy optimization (PPO) algorithm--against a BMS baseline based on human expertise. We derive and validate bounds-based procedures for optimizing OUT and PIL policy parameters and propose a methodology for estimating projected inventory levels, which are also integrated into the DRL policy with demand forecasts to improve decision-making under non-stationarity. Compared to a human-driven policy, which avoids lost sales…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management · Fault Detection and Control Systems

MethodsEntropy Regularization · Proximal Policy Optimization