A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with   Myopic Optimal Policy

Jiameng Lyu; Jinxing Xie; Shilin Yuan; Yuan Zhou

arXiv:2408.16181·math.OC·August 30, 2024·2 cites

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou

PDF

Open Access

TL;DR

This paper introduces a flexible minibatch-SGD-based meta-policy for inventory systems that effectively addresses infeasible target levels, achieving low regret bounds and demonstrating broad applicability and efficiency through extensive experiments.

Contribution

The paper proposes a novel minibatch-SGD meta-policy that is adaptable to various inventory systems, providing theoretical regret bounds and practical effectiveness in complex scenarios.

Findings

01

Achieves $ ilde{O}( oot{T} ull)$ regret for convex cases.

02

Achieves $O( ull ext{log} T)$ regret for strongly convex cases.

03

Demonstrates high computational efficiency and low variance in diverse inventory problems.

Abstract

Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works (e.g., Huh and Rusmevichientong (2009), Shi et al.(2016)) are successful to resolve this issue in various inventory systems. However, their techniques are rather sophisticated and difficult to be applied to more complicated scenarios such as multi-product and multi-constraint inventory systems. In this paper, we address the infeasible-target-inventory-level issue from a new technical perspective -- we propose a novel minibatch-SGD-based meta-policy. Our meta-policy is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Data Stream Mining Techniques