A Two-Layer Framework for Joint Online Configuration Selection and Admission Control
Owen Shen, Haoran Xu, Yinyu Ye, Peter Glynn, Patrick Jaillet

TL;DR
This paper introduces a two-layer online framework for configuration selection and admission control, providing a new benchmark and an algorithm with sublinear regret for applications like LLM serving and GPU scheduling.
Contribution
It proposes a novel two-layer framework with a switching-aware fluid oracle and develops an algorithm with provable regret bounds for joint configuration and admission decisions.
Findings
The switching-aware fluid oracle bounds online policy performance.
The max-min formulation characterizes saddle points for benchmarking.
The SP-UCB--OLP algorithm achieves $ ilde{O}( oot{K}{T})$ regret.
Abstract
We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a max-min formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Age of Information Optimization
