MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees
Herbert Woisetschl\"ager, Ryan Zhang, Shiqiang Wang, Hans-Arno Jacobsen

TL;DR
MESS+ is a novel stochastic optimization method that dynamically routes LLM requests in model zoos, ensuring SLA compliance and reducing costs by learning request satisfaction probabilities in real-time.
Contribution
Introduces MESS+, a new algorithm combining virtual queues and satisfaction prediction for cost-effective, SLA-guaranteed LLM request routing with theoretical analysis.
Findings
Achieves 2x cost savings over existing routing methods
Provides rigorous SLA compliance guarantees
Learns request satisfaction probabilities in real-time
Abstract
Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · DNA and Biological Computing · Energy Efficient Wireless Sensor Networks
Methodstravel james
