A Single-Sample Polylogarithmic Regret Bound for Nonstationary Online Linear Programming
Haoran Xu, Owen Shen, Peter Glynn, Yinyu Ye, Patrick Jaillet

TL;DR
This paper introduces a new algorithm for nonstationary online linear programming that achieves polylogarithmic regret with only one sample per distribution, effectively handling environmental shifts.
Contribution
It presents a novel re-solving algorithm combining dynamic programming and dual frameworks, achieving $O((\log n)^2)$ regret in nonstationary settings with minimal data.
Findings
Achieves polylogarithmic regret in nonstationary environments.
Works with only one sample per distribution.
Bridges gap between stationary and volatile resource allocation.
Abstract
We study nonstationary Online Linear Programming (OLP), where orders arrive sequentially with reward-resource consumption pairs that form a sequence of independent, but not necessarily identically distributed, random vectors. At the beginning of the planning horizon, the decision-maker is provided with a resource endowment that is sufficient to fulfill a significant portion of the requests. The decision-maker seeks to maximize the expected total reward by making immediate and irrevocable acceptance or rejection decisions for each order, subject to this resource endowment. We focus on the challenging single-sample setting, where only one sample from each of the distributions is available at the start of the planning horizon. We propose a novel re-solving algorithm that integrates a dynamic programming perspective with the dual-based frameworks traditionally employed in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Age of Information Optimization
