TL;DR
This paper demonstrates that zero-shot confidence signals, like token log-probability, can effectively estimate small LLMs' correctness, outperforming supervised methods especially out-of-distribution, and introduces retrieval-conditional self-assessment for improved accuracy.
Contribution
It shows zero-shot confidence estimation matches or exceeds supervised baselines and introduces retrieval-conditional self-assessment to enhance confidence signals.
Findings
Token log-probability matches or exceeds supervised baselines in AUROC.
Zero-shot signals outperform supervised methods out-of-distribution.
Retrieval-conditional self-assessment improves confidence estimation with lower latency.
Abstract
How reliably can a small language model estimate its own correctness? The answer determines whether local-to-cloud routing-escalating queries a cheap local model cannot handle-can work without supervised training data. As inference costs dominate large language model (LLM) deployment budgets, routing most queries to a cheap local model while reserving expensive cloud calls for hard cases is an increasingly common cost-control strategy. We compare zero-shot confidence signals against RouteLLM-style supervised baselines across three 7-8B model families and two datasets (1,000 and 500 queries per model, respectively). Average token log-probability, which requires no training data, matches or exceeds supervised baselines in-distribution (Area Under the Receiver Operating Characteristic curve (AUROC) 0.650-0.714 vs. 0.644-0.676) and substantially outperforms them out-of-distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
