Pretrained LLMs Learn Multiple Types of Uncertainty
Roi Cohen, Omri Fahn, Gerard de Melo

TL;DR
This paper investigates how pretrained large language models inherently capture multiple types of uncertainty, which can be used to improve factual correctness and reduce hallucinations without additional training.
Contribution
It demonstrates that LLMs can implicitly encode various uncertainties in their latent space, and that unifying these uncertainties through instruction-tuning enhances correctness prediction.
Findings
Uncertainty can be linearly captured in the model's latent space.
LLMs encode multiple types of uncertainty useful for correctness prediction.
Unifying uncertainty types improves the model's ability to predict correctness.
Abstract
Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we study how well LLMs capture uncertainty, without explicitly being trained for that. We show that, if considering uncertainty as a linear concept in the model's latent space, it might indeed be captured, even after only pretraining. We further show that, though unintuitive, LLMs appear to capture several different types of uncertainty, each of which can be useful to predict the correctness for a specific task or benchmark. Furthermore, we provide in-depth results such as demonstrating a correlation between our correction prediction and the model's ability to abstain from misinformation using words, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
