Deep Policy Iteration with Integer Programming for Inventory Management

Pavithra Harsha; Ashish Jagmohan; Jayant Kalagnanam; Brian Quanz,; Divya Singhvi

arXiv:2112.02215·cs.LG·January 9, 2025

Deep Policy Iteration with Integer Programming for Inventory Management

Pavithra Harsha, Ashish Jagmohan, Jayant Kalagnanam, Brian Quanz,, Divya Singhvi

PDF

Open Access

TL;DR

This paper introduces a deep policy iteration framework combining neural networks and mathematical programming to optimize complex inventory management problems with combinatorial actions, outperforming existing RL methods.

Contribution

The paper presents a novel Programmable Actor Reinforcement Learning (PARL) approach that integrates neural networks with integer programming for inventory replenishment optimization.

Findings

01

PARL outperforms state-of-the-art RL algorithms by up to 14.7% on average.

02

The method effectively manages inventory costs in constrained settings.

03

RL approaches learn near-optimal policies in tractable cases.

Abstract

We present a Reinforcement Learning (RL) based framework for optimizing long-term discounted reward problems with large combinatorial action space and state dependent constraints. These characteristics are common to many operations management problems, e.g., network inventory replenishment, where managers have to deal with uncertain demand, lost sales, and capacity constraints that results in more complex feasible action spaces. Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy iteration method that leverages neural networks (NNs) to approximate the value function and combines it with mathematical programming (MP) and sample average approximation (SAA) to solve the per-step-action optimally while accounting for combinatorial action spaces and state-dependent constraint sets. We show how the proposed methodology can be applied to complex inventory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSupply Chain and Inventory Management · Scheduling and Optimization Algorithms

MethodsBalanced Selection