Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference

Bo-Wei Chen; Chung-Chi Chen; An-Zi Yen

arXiv:2602.22090·cs.CL·February 26, 2026

Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference

Bo-Wei Chen, Chung-Chi Chen, An-Zi Yen

PDF

Open Access 1 Video

TL;DR

This paper introduces a confidence-driven multi-scale model selection method that dynamically chooses models based on confidence estimates to maintain accuracy while significantly reducing computational costs in large language model inference.

Contribution

It presents a novel confidence-based approach for selecting models during inference, improving cost efficiency without sacrificing accuracy.

Findings

01

Achieves accuracy comparable to the largest models with 20-40% less computation.

02

Reduces token usage by approximately 60% in GPT-4o API calls.

03

Demonstrates effectiveness in resource-constrained settings like edge devices.

Abstract

Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education