Queueing Matching Bandits with Preference Feedback
Jung-hun Kim, Min-hwan Oh

TL;DR
This paper introduces algorithms for queueing systems with preference-based server-job matching, balancing system stability and learning unknown service rates, with proven bounds and experimental validation.
Contribution
It proposes UCB and Thompson Sampling algorithms for queueing matching with preference feedback, achieving stability and sublinear regret bounds.
Findings
Algorithms stabilize queues with bounded average length.
Regret bounds are sublinear in time horizon.
Experimental results confirm theoretical performance.
Abstract
In this study, we consider multi-class multi-server asymmetric queueing systems consisting of queues on one side and servers on the other side, where jobs randomly arrive in queues at each time. The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function. At each time, a scheduler assigns jobs to servers, and each server stochastically serves at most one job based on its preferences over the assigned jobs. The primary goal of the algorithm is to stabilize the queues in the system while learning the service rates of servers. To achieve this goal, we propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound of for a large time horizon , where is a traffic slackness of the system. Furthermore, the algorithms achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Auction Theory and Applications
Methodstravel james
