When Routing Collapses: On the Degenerate Convergence of LLM Routers
Guannan Lai, Han-Jia Ye

TL;DR
This paper identifies a failure mode in LLM routing where routers default to the most expensive model as the budget increases, and proposes EquiRouter to directly learn model rankings, reducing costs and improving utilization.
Contribution
The paper introduces EquiRouter, a decision-aware routing method that directly learns model rankings to prevent routing collapse and better utilize smaller models.
Findings
EquiRouter reduces cost by about 17% at GPT-4-level performance.
Routing collapse is caused by mismatch between training objectives and decision criteria.
EquiRouter effectively mitigates routing collapse and improves cost-efficiency.
Abstract
LLM routing aims to achieve a favorable quality--cost trade-off by dynamically assigning easy queries to smaller models and harder queries to stronger ones. However, across both unimodal and multimodal settings, we uncover a pervasive yet underexplored failure mode in existing routers: as the user's cost budget increases, routers systematically default to the most capable and most expensive model even when cheaper models already suffice. As a result, current routers under-utilize small models, wasting computation and monetary cost and undermining the core promise of routing; we term this phenomenon routing collapse. We attribute routing collapse to an objective--decision mismatch: many routers are trained to predict scalar performance scores, whereas routing decisions ultimately depend on discrete comparisons among candidate models. Consequently, small prediction errors can flip…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Network Packet Processing and Optimization · Complexity and Algorithms in Graphs
