Learning Unknown Service Rates in Queues: A Multi-Armed Bandit Approach
Subhashini Krishnasamy, Rajat Sen, Ramesh Johari, Sanjay Shakkottai

TL;DR
This paper investigates learning unknown service rates in queueing systems using multi-armed bandit algorithms, revealing complex regret behaviors and proposing an algorithm that achieves optimal asymptotic queue-regret.
Contribution
It introduces a novel analysis of queue-regret in multi-armed bandit queueing models, showing a transition from logarithmic to inverse-time regret scaling and providing an algorithm that attains this optimal rate.
Findings
Queue-regret initially scales logarithmically with time.
A transition to an inverse-time regret scaling occurs in the late stage.
The proposed algorithm achieves asymptotically optimal queue-regret.
Abstract
Consider a queueing system consisting of multiple servers. Jobs arrive over time and enter a queue for service; the goal is to minimize the size of this queue. At each opportunity for service, at most one server can be chosen, and at most one job can be served. Service is successful with a probability (the service probability) that is a priori unknown for each server. An algorithm that knows the service probabilities (the "genie") can always choose the server of highest service probability. We study algorithms that learn the unknown service probabilities. Our goal is to minimize queue-regret: the (expected) difference between the queue-lengths obtained by the algorithm, and those obtained by the "genie." Since queue-regret cannot be larger than classical regret, results for the standard multi-armed bandit problem give algorithms for which queue-regret increases no more than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
