Non-Stationary Online Resource Allocation: Learning from a Single Sample

Yiding Feng; Jiashuo Jiang; Yige Wang

arXiv:2602.18114·cs.LG·February 23, 2026

Non-Stationary Online Resource Allocation: Learning from a Single Sample

Yiding Feng, Jiashuo Jiang, Yige Wang

PDF

Open Access

TL;DR

This paper introduces a novel online resource allocation method that effectively handles arbitrary non-stationarity with minimal offline data, achieving near-optimal regret bounds in complex, dynamic environments.

Contribution

It proposes a type-dependent quantile-based meta-policy that operates with only one sample per period, handling non-stationarity without variation constraints, and achieves the first poly-logarithmic regret guarantee.

Findings

01

Static threshold policy attains $ ilde{O}( oot{T} ull)$ regret with reward-observed samples.

02

Partially adaptive policy achieves $ ilde{O}(T)$ regret under minimal-arrival assumptions.

03

Fully adaptive policy attains $O(( ext{log } T)^3)$ regret in non-stationary multi-resource allocation.

Abstract

We study online resource allocation under non-stationary demand with a minimum offline data requirement. In this problem, a decision-maker must allocate multiple types of resources to sequentially arriving queries over a finite horizon. Each query belongs to a finite set of types with fixed resource consumption and a stochastic reward drawn from an unknown, type-specific distribution. Critically, the environment exhibits arbitrary non-stationarity -- arrival distributions may shift unpredictably-while the algorithm requires only one historical sample per period to operate effectively. We distinguish two settings based on sample informativeness: (i) reward-observed samples containing both query type and reward realization, and (ii) the more challenging type-only samples revealing only query type information. We propose a novel type-dependent quantile-based meta-policy that decouples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Caching and Content Delivery