# Adaptive LLM Routing under Budget Constraints

**Authors:** Pranoy Panda, Raghav Magazine, Chaitanya Devaguptapu, Sho Takemori, Vishal Sharma

arXiv: 2508.21141 · 2025-09-10

## TL;DR

This paper introduces PILOT, a novel bandit-based approach for adaptive LLM routing that efficiently balances model selection and user budget constraints using online feedback and shared embeddings.

## Contribution

It formulates LLM routing as a contextual bandit problem, developing a shared embedding space and an online cost policy to improve adaptability and resource efficiency.

## Key findings

- PILOT outperforms supervised routing methods in dynamic scenarios.
- The shared embedding space effectively captures query-LLM affinities.
- The online cost policy optimizes resource allocation under budget constraints.

## Abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications. LLM routing addresses this by dynamically selecting the most suitable LLM for each query/task. Previous approaches treat this as a supervised learning problem, assuming complete knowledge of optimal query-LLM pairings. However, real-world scenarios lack such comprehensive mappings and face evolving user queries. We thus propose to study LLM routing as a contextual bandit problem, enabling adaptive decision-making using bandit feedback without requiring exhaustive inference across all LLMs for all queries (in contrast to supervised routing). To address this problem, we develop a shared embedding space for queries and LLMs, where query and LLM embeddings are aligned to reflect their affinity. This space is initially learned from offline human preference data and refined through online bandit feedback. We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB. To handle diverse user budgets for model routing, we introduce an online cost policy modeled as a multi-choice knapsack problem, ensuring resource-efficient routing.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21141/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21141/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/2508.21141/full.md

---
Source: https://tomesphere.com/paper/2508.21141