Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

Luong N. Nguyen

arXiv:2605.02241·cs.AI·May 8, 2026

Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

Luong N. Nguyen

PDF

1 Repo

TL;DR

This paper demonstrates that zero-shot confidence signals, like token log-probability, can effectively estimate small LLMs' correctness, outperforming supervised methods especially out-of-distribution, and introduces retrieval-conditional self-assessment for improved accuracy.

Contribution

It shows zero-shot confidence estimation matches or exceeds supervised baselines and introduces retrieval-conditional self-assessment to enhance confidence signals.

Findings

01

Token log-probability matches or exceeds supervised baselines in AUROC.

02

Zero-shot signals outperform supervised methods out-of-distribution.

03

Retrieval-conditional self-assessment improves confidence estimation with lower latency.

Abstract

How reliably can a small language model estimate its own correctness? The answer determines whether local-to-cloud routing-escalating queries a cheap local model cannot handle-can work without supervised training data. As inference costs dominate large language model (LLM) deployment budgets, routing most queries to a cheap local model while reserving expensive cloud calls for hard cases is an increasingly common cost-control strategy. We compare zero-shot confidence signals against RouteLLM-style supervised baselines across three 7-8B model families and two datasets (1,000 and 500 queries per model, respectively). Average token log-probability, which requires no training data, matches or exceeds supervised baselines in-distribution (Area Under the Receiver Operating Characteristic curve (AUROC) 0.650-0.714 vs. 0.644-0.676) and substantially outperforms them out-of-distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

your-repo-placeholder
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.