A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy
Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou

TL;DR
This paper introduces a flexible minibatch-SGD-based meta-policy for inventory systems that effectively addresses infeasible target levels, achieving low regret bounds and demonstrating broad applicability and efficiency through extensive experiments.
Contribution
The paper proposes a novel minibatch-SGD meta-policy that is adaptable to various inventory systems, providing theoretical regret bounds and practical effectiveness in complex scenarios.
Findings
Achieves $ ilde{O}( oot{T} ull)$ regret for convex cases.
Achieves $O( ull ext{log} T)$ regret for strongly convex cases.
Demonstrates high computational efficiency and low variance in diverse inventory problems.
Abstract
Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works (e.g., Huh and Rusmevichientong (2009), Shi et al.(2016)) are successful to resolve this issue in various inventory systems. However, their techniques are rather sophisticated and difficult to be applied to more complicated scenarios such as multi-product and multi-constraint inventory systems. In this paper, we address the infeasible-target-inventory-level issue from a new technical perspective -- we propose a novel minibatch-SGD-based meta-policy. Our meta-policy is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Data Stream Mining Techniques
