MixLLM: Dynamic Routing in Mixed Large Language Models
Xinyuan Wang, Yanchi Liu, Wei Cheng, Xujiang Zhao, Zhengzhang Chen,, Wenchao Yu, Yanjie Fu, Haifeng Chen

TL;DR
MixLLM is a dynamic routing system for large language models that optimizes query assignment to balance response quality, cost, and latency, adapting over time with continual learning.
Contribution
We introduce MixLLM, a novel system using contextual bandits and continual training to effectively route queries among mixed LLMs, addressing dynamic trade-offs and model set changes.
Findings
Achieves 97.25% of GPT-4's quality at 24.18% of the cost
Outperforms baseline routing methods in response quality and efficiency
Adapts to evolving query patterns and model sets over time
Abstract
Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency. Given mixed LLMs with their own strengths and weaknesses, LLM routing aims to identify the most suitable model for each query in the stream to maximize response quality and minimize cost and latency. However, the challenges involve: (1) dynamic trade-offs among quality, cost, and latency; (2) enabling continual learning in deployed systems; and (3) navigating a varying (e.g., new LLM addition or old LLM removal) set of LLM candidates over time. To bridge these gaps, we develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment. Specifically, we first leverage query tags to enhance query embeddings for the routing task. Next, we design lightweight prediction models to estimate the response qualities and costs of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
MethodsSparse Evolutionary Training
