Cost-Saving LLM Cascades with Early Abstention

Michael J. Zellinger; Rex Liu; Matt Thomson

arXiv:2502.09054·cs.AI·April 1, 2025

Cost-Saving LLM Cascades with Early Abstention

Michael J. Zellinger, Rex Liu, Matt Thomson

PDF

Open Access

TL;DR

This paper explores early abstention in LLM cascades, allowing smaller models to abstain before expensive models are invoked, which reduces costs and errors while maintaining performance in risk-sensitive domains.

Contribution

It introduces and empirically evaluates the concept of early abstention in LLM cascades, demonstrating its benefits in cost reduction and error minimization across multiple benchmarks.

Findings

01

Early abstention reduces overall test loss by 2.2%.

02

It decreases costs by 13.0% and error rates by 5.0%.

03

Allows more effective use of abstention by leveraging error pattern correlations.

Abstract

LLM cascades deploy small LLMs to answer most queries, limiting the use of large and expensive LLMs to difficult queries. This approach can significantly reduce costs without impacting performance. However, risk-sensitive domains such as finance or medicine place an additional premium on avoiding model errors. Since even the most expensive models are susceptible to making mistakes, applications in these domains benefit from allowing LLM systems to completely abstain from answering difficult queries. Introducing abstention poses a design question for LLM cascades: should abstention only be allowed at the final model or also at earlier models? Since the error patterns of small and large models are correlated, allowing earlier models to abstain may reduce inference costs and latency by anticipating abstention decisions by expensive and slow models, thus avoiding the need to run these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReservoir Engineering and Simulation Methods