A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck, Maximilian Baader, Martin Vechev

TL;DR
This paper introduces a unified, theoretically optimal framework called cascade routing that combines routing and cascading strategies for large language models, improving cost-performance tradeoffs in model selection.
Contribution
It derives a formal optimal strategy for cascading, proves the optimality of routing, and proposes a unified approach that outperforms existing methods in model selection for LLMs.
Findings
Cascade routing outperforms individual strategies significantly.
Good quality estimators are crucial for effective model selection.
The framework identifies when routing or cascading are most beneficial.
Abstract
The availability of a wide range of large language models (LLMs) embedded in various agentic systems has significantly increased the potential of model selection strategies to improve the cost-performance tradeoff. Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found. However, current approaches face three key limitations: they (1) lack formal proofs of optimality, (2) fail to identify the conditions under which these strategies are most effective to improve the cost-performance tradeoff, and (3) are unable to combine both paradigms for further improvements. To address these issues, we first derive a novel optimal strategy for cascading and prove the optimality of an existing routing strategy. Further, we propose cascade routing, a unified framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsSparse Evolutionary Training
