Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades

Dylan Bouchard

arXiv:2605.06350·cs.LG·May 8, 2026

Is Escalation Worth It? A Decision-Theoretic Characterization of LLM Cascades

Dylan Bouchard

PDF

TL;DR

This paper introduces a decision-theoretic framework for model cascades, analyzing cost-quality tradeoffs and optimal strategies for deploying multiple language models efficiently.

Contribution

It develops a formal optimization approach for cascade design, characterizes the cost-quality frontier, and validates findings on multiple benchmarks.

Findings

01

Deterministic fixed chains underperform compared to pairwise cascades.

02

Optimized subsequence cascades do not significantly outperform the pairwise envelope.

03

A lightweight router often surpasses cascade policies by avoiding cheap model costs.

Abstract

Model cascades, in which a cheap LLM defers to an expensive one on low-confidence queries, are widely used to navigate the cost-quality tradeoff at deployment. Existing approaches largely treat the deferral threshold as an empirical hyperparameter, with limited guidance on the geometry of the resulting cost-quality frontier over a model pool. We develop a decision-theoretic framework grounded in constrained optimization and duality. For a two-model cascade, we establish piecewise concavity of the cost-quality frontier on decreasing-benefit regions of the confidence support, with reciprocal shadow prices linking the budget- and quality-constrained formulations. Given a pool of $k$ models, we characterize the frontier achievable by deterministic two-model threshold cascades as the pointwise envelope over $(2 k)$ pairwise cascades, with switching points where the optimal pair…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.