Deep Policy Iteration with Integer Programming for Inventory Management
Pavithra Harsha, Ashish Jagmohan, Jayant Kalagnanam, Brian Quanz,, Divya Singhvi

TL;DR
This paper introduces a deep policy iteration framework combining neural networks and mathematical programming to optimize complex inventory management problems with combinatorial actions, outperforming existing RL methods.
Contribution
The paper presents a novel Programmable Actor Reinforcement Learning (PARL) approach that integrates neural networks with integer programming for inventory replenishment optimization.
Findings
PARL outperforms state-of-the-art RL algorithms by up to 14.7% on average.
The method effectively manages inventory costs in constrained settings.
RL approaches learn near-optimal policies in tractable cases.
Abstract
We present a Reinforcement Learning (RL) based framework for optimizing long-term discounted reward problems with large combinatorial action space and state dependent constraints. These characteristics are common to many operations management problems, e.g., network inventory replenishment, where managers have to deal with uncertain demand, lost sales, and capacity constraints that results in more complex feasible action spaces. Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy iteration method that leverages neural networks (NNs) to approximate the value function and combines it with mathematical programming (MP) and sample average approximation (SAA) to solve the per-step-action optimally while accounting for combinatorial action spaces and state-dependent constraint sets. We show how the proposed methodology can be applied to complex inventory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Scheduling and Optimization Algorithms
MethodsBalanced Selection
