Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

Huizi Cui; Huan Ma; Qilin Wang; Yuhang Gao; Changqing Zhang

arXiv:2605.05777·cs.CL·May 8, 2026

Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

Huizi Cui, Huan Ma, Qilin Wang, Yuhang Gao, Changqing Zhang

PDF

TL;DR

This paper introduces DisAAD, a novel method for estimating uncertainty in black-box LLMs using a lightweight proxy model trained via adversarial distillation, enabling real-time uncertainty quantification.

Contribution

DisAAD is a new generation-distillation approach that guides a small proxy model to accurately estimate black-box LLM uncertainty without extensive sampling.

Findings

01

A proxy model with only 1% of the LLM's size can reliably quantify uncertainty.

02

DisAAD outperforms existing methods in real-time uncertainty estimation.

03

The approach effectively captures implicit information in black-box reasoning processes.

Abstract

Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only via APIs. Existing uncertainty quantification methods typically depend on computationally expensive multiple sampling or internal parameters, which prevents real-time estimation and fails to capture information implicit in the black-box reasoning process. To address this issue, we propose Distribution-Aligned Adversarial Distillation (DisAAD), which introduces a generation-discrimination architecture to guide a lightweight proxy model to learn the high-quality regions of the output distribution of the black-box LLM, thus effectively endowing it with the ability to know whether the black-box LLM knows or not. Subsequently, we use the proxy model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.