How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding

Wei Chen; Guoyang Ju; Yuanyuan Qi

arXiv:2603.18009·cs.CL·March 20, 2026

How Confident Is the First Token? An Uncertainty-Calibrated Prompt Optimization Framework for Large Language Model Classification and Understanding

Wei Chen, Guoyang Ju, Yuanyuan Qi

PDF

Open Access

TL;DR

This paper introduces a novel uncertainty measurement method called Log-Scale Focal Uncertainty (LSFU) for large language models, enabling more reliable prompt optimization by distinguishing true confidence from prior-induced spurious confidence.

Contribution

The paper proposes LSFU, a first-token-based uncertainty metric that incorporates class priors, and develops UCPOF, an uncertainty-calibrated prompt optimization framework that improves accuracy and reduces computational costs.

Findings

01

UCPOF improves accuracy by 6.03% over few-shot baselines.

02

UCPOF surpasses full RAG by 5.75% in average accuracy.

03

UCPOF reduces retrieval trigger rate by 50.66%.

Abstract

With the widespread adoption of large language models (LLMs) in natural language processing, prompt engineering and retrieval-augmented generation (RAG) have become mainstream to enhance LLMs' performance on complex tasks. However, LLMs generate outputs autoregressively, leading to inevitable output uncertainty. Since model performance is highly sensitive to prompt design, precise uncertainty measurement is crucial for reliable prompt optimization. For multi-class multiple-choice (understanding) tasks, conventional uncertainty measures (e.g., entropy) based on output probabilities treat all classes equally and ignore class prior differences in pretraining corpora. This failure to distinguish spurious confidence (from priors) from true certainty (from contextual understanding) results in poor confidence calibration. To address this, we propose Log-Scale Focal Uncertainty (LSFU), a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification