Learning payoffs while routing in skill-based queues
Sanne van Kempen, Jaron Sanders, Fiona Sloothaak, Maarten G. Wolf

TL;DR
This paper develops a machine learning algorithm for routing in skill-based queues that adaptively learns payoff parameters and maximizes total payoff, achieving near-optimal regret bounds in dynamic service systems.
Contribution
It introduces a novel algorithm that learns customer-server payoffs while routing, with proven regret bounds and asymptotic optimality, addressing the complex queue-learning interaction.
Findings
Achieves polylogarithmic regret in learning payoffs.
Proves asymptotic optimality of the algorithm.
Demonstrates effectiveness in time-varying environments.
Abstract
Motivated by applications in service systems, we consider queueing systems where each customer must be handled by a server with the right skill set. We focus on optimizing the routing of customers to servers in order to maximize the total payoff of customer--server matches. In addition, customer--server dependent payoff parameters are assumed to be unknown a priori. We construct a machine learning algorithm that adaptively learns the payoff parameters while maximizing the total payoff and prove that it achieves polylogarithmic regret. Moreover, we show that the algorithm is asymptotically optimal up to logarithmic terms by deriving a regret lower bound. The algorithm leverages the basic feasible solutions of a static linear program as the action space. The regret analysis overcomes the complex interplay between queueing and learning by analyzing the convergence of the queue length…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james · Focus
