Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades
Dylan Bouchard

TL;DR
This paper introduces a decision-theoretic framework for model cascades, analyzing cost-quality tradeoffs and optimal strategies for deploying multiple language models efficiently.
Contribution
It develops a formal optimization approach for cascade design, characterizes the cost-quality frontier, and validates findings on multiple benchmarks.
Findings
Deterministic fixed chains underperform compared to pairwise cascades.
Optimized subsequence cascades do not significantly outperform the pairwise envelope.
A lightweight router often surpasses cascade policies by avoiding cheap model costs.
Abstract
Model cascades, in which a cheap LLM defers to an expensive one on low-confidence queries, are widely used to navigate the cost-quality tradeoff at deployment. Existing approaches largely treat the deferral threshold as an empirical hyperparameter, with limited guidance on the geometry of the resulting cost-quality frontier over a model pool. We develop a decision-theoretic framework grounded in constrained optimization and duality. For a two-model cascade, we establish piecewise concavity of the cost-quality frontier on decreasing-benefit regions of the confidence support, with reciprocal shadow prices linking the budget- and quality-constrained formulations. Given a pool of models, we characterize the frontier achievable by deterministic two-model threshold cascades as the pointwise envelope over pairwise cascades, with switching points where the optimal pair…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
