Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference
Bo-Wei Chen, Chung-Chi Chen, An-Zi Yen

TL;DR
This paper introduces a confidence-driven multi-scale model selection method that dynamically chooses models based on confidence estimates to maintain accuracy while significantly reducing computational costs in large language model inference.
Contribution
It presents a novel confidence-based approach for selecting models during inference, improving cost efficiency without sacrificing accuracy.
Findings
Achieves accuracy comparable to the largest models with 20-40% less computation.
Reduces token usage by approximately 60% in GPT-4o API calls.
Demonstrates effectiveness in resource-constrained settings like edge devices.
Abstract
Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
