Leveraging Uncertainty Estimation for Efficient LLM Routing
Tuo Zhang, Asal Mehradfar, Dimitrios Dimitriadis, Salman Avestimehr

TL;DR
This paper introduces a Confidence-Driven LLM Router that uses uncertainty estimation to improve routing decisions, balancing cost and response quality by simulating human preferences and outperforming existing methods.
Contribution
It presents a novel framework leveraging uncertainty estimation for LLM routing, incorporating human-like response quality assessment, and demonstrating superior performance over existing methods.
Findings
Outperforms state-of-the-art routing methods in experiments
Achieves better response quality while maintaining cost efficiency
Introduces LLM-as-a-Judge for response quality assessment
Abstract
Deploying large language models (LLMs) in edge-cloud environments requires an efficient routing strategy to balance cost and response quality. Traditional approaches prioritize either human-preference data or accuracy metrics from benchmark datasets as routing criteria, but these methods suffer from rigidity and subjectivity. Moreover, existing routing frameworks primarily focus on accuracy and cost, neglecting response quality from a human preference perspective. In this work, we propose the Confidence-Driven LLM Router, a novel framework that leverages uncertainty estimation to optimize routing decisions. To comprehensively assess routing performance, we evaluate both system cost efficiency and response quality. In particular, we introduce the novel use of LLM-as-a-Judge to simulate human rating preferences, providing the first systematic assessment of response quality across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Security and Resilience · Network Traffic and Congestion Control · Network Time Synchronization Technologies
MethodsFocus
