Optimising Calls to Large Language Models with Uncertainty-Based   Two-Tier Selection

Guillem Ram\'irez; Alexandra Birch; Ivan Titov

arXiv:2405.02134·cs.CL·April 28, 2025

Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection

Guillem Ram\'irez, Alexandra Birch, Ivan Titov

PDF

Open Access

TL;DR

This paper introduces a simple uncertainty-based two-tier selection method for optimizing large language model calls, effectively balancing cost and performance without extra neural models, outperforming existing strategies in most tests.

Contribution

The paper proposes a novel, simpler approach using only the small LLM's uncertainty as a decision criterion, eliminating the need for additional neural models in LLM call optimization.

Findings

01

Outperforms existing methods in 25 out of 27 setups

02

Balances cost and performance effectively

03

Works across multiple LLM pairs and tasks

Abstract

Researchers and practitioners operating on a limited budget face the cost-performance trade-off dilemma. The challenging decision often centers on whether to use a large LLM with better performance or a smaller one with reduced costs. This has motivated recent research in the optimisation of LLM calls. Either a cascading strategy is used, where a smaller LLM or both are called sequentially, or a routing strategy is used, where only one model is ever called. Both scenarios are dependent on a decision criterion which is typically implemented by an extra neural model. In this work, we propose a simpler solution; we use only the uncertainty of the generations of the small LLM as the decision criterion. We compare our approach with both cascading and routing strategies using three different pairs of pre-trained small and large LLMs, on nine different tasks and against approaches that require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling