Learning to Route and Schedule LLMs from User Retrials via Contextual Queueing Bandits

Seoungbin Bae; Junyoung Son; Dabeen Lee

arXiv:2602.02061·cs.LG·February 3, 2026

Learning to Route and Schedule LLMs from User Retrials via Contextual Queueing Bandits

Seoungbin Bae, Junyoung Son, Dabeen Lee

PDF

Open Access

TL;DR

This paper introduces a novel online learning framework for routing and scheduling in conversational LLM services, leveraging user retrial behaviors as implicit feedback to improve efficiency and user experience.

Contribution

It proposes the CQB-MNL framework and the ACQB algorithm, combining contextual bandits with queue management to handle user retrials and feedback in LLM routing.

Findings

01

ACQB achieves near-optimal regret bounds for routing and scheduling.

02

Experiments show ACQB outperforms baseline algorithms on multiple datasets.

03

Contrastive learning improves query embedding quality.

Abstract

Explosive demands for LLMs often cause user queries to accumulate in server queues, requiring efficient routing (query-LLM matching) and scheduling (query prioritization) mechanisms. Several online algorithms are being deployed, but they overlook the following two key challenges inherent to conversational LLM services: (1) unsatisfied users may retry queries, increasing the server backlog, and (2) requests for ``explicit" feedback, such as ratings, degrade user experiences. In this paper, we develop a joint routing and scheduling algorithm that leverages ``implicit" feedback inferred from user retrial behaviors. The key idea is to propose and study the framework of contextual queueing bandits with multinomial logit feedback (CQB-MNL). CQB-MNL models query retrials, as well as context-based learning for user preferences over LLMs. Our algorithm, anytime CQB (ACQB), achieves efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Age of Information Optimization · Advanced Bandit Algorithms Research