Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization
Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu,, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu

TL;DR
This paper investigates uncertainty-based routing strategies for on-device large language models, aiming to balance efficiency and accuracy by offloading complex queries to stronger models, and proposes methods to improve generalization across datasets.
Contribution
It provides a comprehensive benchmarking of uncertainty-driven routing strategies and introduces a calibration data construction pipeline to enhance generalization to new datasets.
Findings
Uncertainty-correctness alignment affects routing performance.
Uncertainty distributions are more dependent on the SLM and UQ method than on data.
Calibration data improves routing generalization without new data.
Abstract
Large language models (LLMs) are increasingly deployed and democratized on edge devices. To improve the efficiency of on-device deployment, small language models (SLMs) are often adopted due to their efficient decoding latency and reduced energy consumption. However, these SLMs often generate inaccurate responses when handling complex queries. One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM. This follows the principle of "If you lack confidence, seek stronger support" to enhance reliability. Relying on more powerful LLMs is yet effective but increases invocation costs. Therefore, striking a routing balance between efficiency and efficacy remains a critical challenge. Additionally, efficiently generalizing the routing strategy to new datasets remains under-explored. In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Power Line Communications and Noise · Digital Platforms and Economics
MethodsSparse Evolutionary Training
