Non-Stationary Online Resource Allocation: Learning from a Single Sample
Yiding Feng, Jiashuo Jiang, Yige Wang

TL;DR
This paper introduces a novel online resource allocation method that effectively handles arbitrary non-stationarity with minimal offline data, achieving near-optimal regret bounds in complex, dynamic environments.
Contribution
It proposes a type-dependent quantile-based meta-policy that operates with only one sample per period, handling non-stationarity without variation constraints, and achieves the first poly-logarithmic regret guarantee.
Findings
Static threshold policy attains $ ilde{O}( oot{T} ull)$ regret with reward-observed samples.
Partially adaptive policy achieves $ ilde{O}(T)$ regret under minimal-arrival assumptions.
Fully adaptive policy attains $O(( ext{log } T)^3)$ regret in non-stationary multi-resource allocation.
Abstract
We study online resource allocation under non-stationary demand with a minimum offline data requirement. In this problem, a decision-maker must allocate multiple types of resources to sequentially arriving queries over a finite horizon. Each query belongs to a finite set of types with fixed resource consumption and a stochastic reward drawn from an unknown, type-specific distribution. Critically, the environment exhibits arbitrary non-stationarity -- arrival distributions may shift unpredictably-while the algorithm requires only one historical sample per period to operate effectively. We distinguish two settings based on sample informativeness: (i) reward-observed samples containing both query type and reward realization, and (ii) the more challenging type-only samples revealing only query type information. We propose a novel type-dependent quantile-based meta-policy that decouples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Caching and Content Delivery
