PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems
Amit Singh Bhatti, Vishal Vaddina, Dagnachew Birru

TL;DR
PROTEUS is a Lagrangian RL-based router for multi-LLM systems that dynamically enforces accuracy SLAs, achieving high compliance and significant cost savings without retraining across a range of accuracy targets.
Contribution
We introduce PROTEUS, a novel SLA-aware routing system using Lagrangian dual control that adapts to accuracy targets in real-time for multi-LLM serving.
Findings
PROTEUS achieves over 90% accuracy compliance across various targets.
It reduces costs by up to 89.8% compared to fixed models.
PROTEUS operates effectively across a wide accuracy spectrum with a single model.
Abstract
Production LLM deployments serve diverse workloads where cost and quality requirements vary by customer tier, time of day, and query criticality. Model serving systems accept latency SLOs directly. LLM routers do not. They force operators to tune parameters offline and guess what accuracy might result. The relationship between parameters and outcomes is indirect, non-monotonic, and dataset-dependent. Operators need to specify accuracy targets, not infer them from opaque settings. We present PROTEUS (Polymorphic Router for Operational Target Enforcement with Unified SLA), a router that accepts accuracy targets tau as runtime input. PROTEUS uses Lagrangian dual control. A learned dual variable lambda tracks constraint violations during training and conditions the policy network. This lets the router translate specified tau values into routing decisions that satisfy them. A single trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Network Traffic and Congestion Control · Network Packet Processing and Optimization
