OmniRouter: Budget and Performance Controllable Multi-LLM Routing

Kai Mei; Wujiang Xu; Minghao Guo; Shuhang Lin; Yongfeng Zhang

arXiv:2502.20576·cs.DB·December 1, 2025

OmniRouter: Budget and Performance Controllable Multi-LLM Routing

Kai Mei, Wujiang Xu, Minghao Guo, Shuhang Lin, Yongfeng Zhang

PDF

Open Access 3 Repos

TL;DR

OmniRouter introduces a globally optimized multi-LLM routing framework that balances cost and performance, improving accuracy and reducing computational costs compared to existing methods.

Contribution

It models LLM routing as a constrained optimization problem and employs a hybrid predictor with a Lagrangian dual optimizer for globally optimal resource allocation.

Findings

01

Achieves up to 6.30% higher response accuracy.

02

Reduces computational costs by at least 10.15%.

03

Demonstrates effective global resource management in multi-LLM serving.

Abstract

Large language models (LLMs) deliver superior performance but require substantial computational resources and operate with relatively low efficiency, while smaller models can efficiently handle simpler tasks with fewer resources. LLM routing is a crucial paradigm that dynamically selects the most suitable large language models from a pool of candidates to process diverse inputs, ensuring optimal resource utilization while maintaining response quality. Existing routing frameworks typically model this as a locally optimal decision-making problem, selecting the presumed best-fit LLM for each query individually, which overlooks global budget constraints, resulting in ineffective resource allocation. To tackle this problem, we introduce OmniRouter, a fundamentally controllable routing framework for multi-LLM serving. Instead of making per-query greedy choices, OmniRouter models the routing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · IoT and Edge/Fog Computing · Service-Oriented Architecture and Web Services