Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

TL;DR
This paper introduces two novel metrics to quantify the uncertainty in LLM explanations, revealing that probing uncertainty correlates with explanation faithfulness, thus advancing trustworthiness assessment of large language models.
Contribution
It proposes verbalized and probing uncertainty metrics for LLM explanations, with empirical analysis showing probing uncertainty's correlation with explanation faithfulness.
Findings
Verbalized uncertainty is unreliable for estimating explanation confidence.
Probing uncertainty correlates with explanation faithfulness.
Lower probing uncertainty indicates more faithful explanations.
Abstract
Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLMs behavior. In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs. To this end, we propose two novel metrics -- and -- to quantify the uncertainty of generated explanations. While verbalized uncertainty involves prompting the LLM to express its confidence in its explanations, probing uncertainty leverages sample and model perturbations as a means to quantify the uncertainty. Our empirical analysis of benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
