Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM   Routing From Benchmarking to Generalization

Yu-Neng Chuang; Leisheng Yu; Guanchu Wang; Lizhe Zhang; Zirui Liu,; Xuanting Cai; Yang Sui; Vladimir Braverman; Xia Hu

arXiv:2502.04428·cs.CL·February 10, 2025

Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization

Yu-Neng Chuang, Leisheng Yu, Guanchu Wang, Lizhe Zhang, Zirui Liu,, Xuanting Cai, Yang Sui, Vladimir Braverman, Xia Hu

PDF

Open Access

TL;DR

This paper investigates uncertainty-based routing strategies for on-device large language models, aiming to balance efficiency and accuracy by offloading complex queries to stronger models, and proposes methods to improve generalization across datasets.

Contribution

It provides a comprehensive benchmarking of uncertainty-driven routing strategies and introduces a calibration data construction pipeline to enhance generalization to new datasets.

Findings

01

Uncertainty-correctness alignment affects routing performance.

02

Uncertainty distributions are more dependent on the SLM and UQ method than on data.

03

Calibration data improves routing generalization without new data.

Abstract

Large language models (LLMs) are increasingly deployed and democratized on edge devices. To improve the efficiency of on-device deployment, small language models (SLMs) are often adopted due to their efficient decoding latency and reduced energy consumption. However, these SLMs often generate inaccurate responses when handling complex queries. One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM. This follows the principle of "If you lack confidence, seek stronger support" to enhance reliability. Relying on more powerful LLMs is yet effective but increases invocation costs. Therefore, striking a routing balance between efficiency and efficacy remains a critical challenge. Additionally, efficiently generalizing the routing strategy to new datasets remains under-explored. In this paper,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Power Line Communications and Noise · Digital Platforms and Economics

MethodsSparse Evolutionary Training