Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution

Manhin Poon; XiangXiang Dai; Xutong Liu; Fang Kong; John C.S. Lui; Jinhang Zuo

arXiv:2506.17670·cs.LG·June 24, 2025

Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution

Manhin Poon, XiangXiang Dai, Xutong Liu, Fang Kong, John C.S. Lui, Jinhang Zuo

PDF

1 Video

TL;DR

This paper introduces a novel online framework using contextual bandits to adaptively select the most suitable large language model for user queries, effectively handling unstructured prompt evolution without offline data.

Contribution

It presents the first contextual bandit approach for sequential LLM selection with unstructured prompt dynamics, including theoretical guarantees and practical extensions for cost and user preference considerations.

Findings

01

Outperforms existing LLM routing strategies in accuracy

02

Achieves lower costs in diverse benchmarks

03

Provides sublinear regret guarantees

Abstract

Large language models (LLMs) exhibit diverse response behaviors, costs, and strengths, making it challenging to select the most suitable LLM for a given user query. We study the problem of adaptive multi-LLM selection in an online setting, where the learner interacts with users through multi-step query refinement and must choose LLMs sequentially without access to offline datasets or model internals. A key challenge arises from unstructured context evolution: the prompt dynamically changes in response to previous model outputs via a black-box process, which cannot be simulated, modeled, or learned. To address this, we propose the first contextual bandit framework for sequential LLM selection under unstructured prompt dynamics. We formalize a notion of myopic regret and develop a LinUCB-based algorithm that provably achieves sublinear regret without relying on future context prediction.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Online Multi-LLM Selection via Contextual Bandits Under Unstructured Context Evolution· underline