The Trilemma of Truth in Large Language Models

Germans Savcisens; Tina Eliassi-Rad

arXiv:2506.23921·cs.CL·November 18, 2025

The Trilemma of Truth in Large Language Models

Germans Savcisens, Tina Eliassi-Rad

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces sAwMIL, a novel probing framework combining multiple-instance learning and conformal prediction to better assess the truthfulness of information encoded in large language models, revealing complex encoding patterns.

Contribution

The study identifies flaws in existing probing methods and proposes sAwMIL, a new approach that improves the reliability of truthfulness assessment in LLMs by leveraging internal activations.

Findings

01

Common probing methods are unreliable and sometimes worse than zero-shot prompting.

02

Truth and falsehood are not encoded symmetrically in LLMs.

03

LLMs encode a third signal, distinct from both true and false.

Abstract

The public often attributes human-like qualities to large language models (LLMs) and assumes they "know" certain things. In reality, LLMs encode information retained during training as internal probabilistic knowledge. This study examines existing methods for probing the veracity of that knowledge and identifies several flawed underlying assumptions. To address these flaws, we introduce sAwMIL (Sparse-Aware Multiple-Instance Learning), a multiclass probing framework that combines multiple-instance learning with conformal prediction. sAwMIL leverages internal activations of LLMs to classify statements as true, false, or neither. We evaluate sAwMIL across 16 open-source LLMs, including default and chat-based variants, on three new curated datasets. Our results show that (1) common probing methods fail to provide a reliable and transferable veracity direction and, in some settings, perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

carlomarxdk/trilemma-of-truth
pytorchOfficial

Datasets

carlomarxx/trilemma-of-truth
dataset· 133 dl
133 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Support Vector Machine