MixLLM: Dynamic Routing in Mixed Large Language Models

Xinyuan Wang; Yanchi Liu; Wei Cheng; Xujiang Zhao; Zhengzhang Chen,; Wenchao Yu; Yanjie Fu; Haifeng Chen

arXiv:2502.18482·cs.CL·February 27, 2025

MixLLM: Dynamic Routing in Mixed Large Language Models

Xinyuan Wang, Yanchi Liu, Wei Cheng, Xujiang Zhao, Zhengzhang Chen,, Wenchao Yu, Yanjie Fu, Haifeng Chen

PDF

Open Access 1 Video

TL;DR

MixLLM is a dynamic routing system for large language models that optimizes query assignment to balance response quality, cost, and latency, adapting over time with continual learning.

Contribution

We introduce MixLLM, a novel system using contextual bandits and continual training to effectively route queries among mixed LLMs, addressing dynamic trade-offs and model set changes.

Findings

01

Achieves 97.25% of GPT-4's quality at 24.18% of the cost

02

Outperforms baseline routing methods in response quality and efficiency

03

Adapts to evolving query patterns and model sets over time

Abstract

Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency. Given mixed LLMs with their own strengths and weaknesses, LLM routing aims to identify the most suitable model for each query in the stream to maximize response quality and minimize cost and latency. However, the challenges involve: (1) dynamic trade-offs among quality, cost, and latency; (2) enabling continual learning in deployed systems; and (3) navigating a varying (e.g., new LLM addition or old LLM removal) set of LLM candidates over time. To bridge these gaps, we develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment. Specifically, we first leverage query tags to enhance query embeddings for the routing task. Next, we design lightweight prediction models to estimate the response qualities and costs of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MixLLM: Dynamic Routing in Mixed Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

MethodsSparse Evolutionary Training