Truth-value judgment in language models: 'truth directions' are context sensitive
Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen

TL;DR
This paper investigates how large language models' truth-value directions are influenced by context, revealing that these directions are often context-sensitive and act as causal mediators in in-context inference.
Contribution
It provides a detailed analysis of the context sensitivity of truth-value directions in LLMs and demonstrates their causal role in in-context reasoning processes.
Findings
Truth-value directions are generally context sensitive.
Context often impacts probe outputs even when it should not.
Truth directions act as causal mediators in inference.
Abstract
Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as uncovering a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM the probe's predictions are (most) sensitive to the presence of related sentences, and how to best characterize this kind of sensitivity. We do so by measuring different types of consistency errors that occur after probing an LLM whose inputs consist of hypotheses preceded by (negated) supporting and contradicting sentences. We also perform a causal intervention experiment, investigating whether moving the representation of a premise along these truth-value directions influences the position of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Bayesian Modeling and Causal Inference
