Learning to Admit Optimally in an $M/M/k/k+N$ Queueing System with Unknown Service Rate
Saghar Adler, Mehrdad Moharrami, Vijay Subramanian

TL;DR
This paper develops a learning-based admission control policy for an $M/M/k/k+N$ queueing system with unknown service rate, optimizing long-term rewards without observing service times or departures.
Contribution
It introduces a parametric learning approach that asymptotically converges to the optimal policy and provides finite-time regret guarantees for the system.
Findings
Policy asymptotically converges to optimal
Finite-time regret bounds vary by parameter regime
Effective learning without observing service times
Abstract
Motivated by applications of the Erlang-B blocking model and the extended model that allows for some queueing, beyond communication networks to sizing and pricing in production, messaging, and app-based parking systems, we study admission control for such systems with unknown service rate. In our model, a dispatcher either admits every arrival into the system (when there is room) or blocks it. Every served job yields a fixed reward but incurs a per unit time holding cost which includes the waiting time in the queue to get service if there is any. We aim to design a dispatching policy that maximizes the long-term average reward by observing arrival times and system state at arrivals, a realistic decision-event driven sampling of such systems. The dispatcher observes neither service times nor departure epochs, which excludes the use of reward-based reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Age of Information Optimization
Methodstravel james
