Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
Huizi Cui, Huan Ma, Qilin Wang, Yuhang Gao, Changqing Zhang

TL;DR
This paper introduces DisAAD, a novel method for estimating uncertainty in black-box LLMs using a lightweight proxy model trained via adversarial distillation, enabling real-time uncertainty quantification.
Contribution
DisAAD is a new generation-distillation approach that guides a small proxy model to accurately estimate black-box LLM uncertainty without extensive sampling.
Findings
A proxy model with only 1% of the LLM's size can reliably quantify uncertainty.
DisAAD outperforms existing methods in real-time uncertainty estimation.
The approach effectively captures implicit information in black-box reasoning processes.
Abstract
Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only via APIs. Existing uncertainty quantification methods typically depend on computationally expensive multiple sampling or internal parameters, which prevents real-time estimation and fails to capture information implicit in the black-box reasoning process. To address this issue, we propose Distribution-Aligned Adversarial Distillation (DisAAD), which introduces a generation-discrimination architecture to guide a lightweight proxy model to learn the high-quality regions of the output distribution of the black-box LLM, thus effectively endowing it with the ability to know whether the black-box LLM knows or not. Subsequently, we use the proxy model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
