Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing

Raoyuan Zhao; Abdullatif K\"oksal; Ali Modarressi; Michael A. Hedderich; Hinrich Sch\"utze

arXiv:2505.21701·cs.CL·June 2, 2025

Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing

Raoyuan Zhao, Abdullatif K\"oksal, Ali Modarressi, Michael A. Hedderich, Hinrich Sch\"utze

PDF

Open Access

TL;DR

This paper investigates the reliability of knowledge probing methods for large language models, revealing significant inconsistencies within and across methods, which challenges their effectiveness in identifying true knowledge gaps.

Contribution

The study introduces a new evaluation process using input variations and metrics, exposing critical inconsistencies in current knowledge probing techniques for LLMs.

Findings

01

Intra-method inconsistency: small prompt changes cause large variance in results.

02

Cross-method inconsistency: different probing methods often contradict each other.

03

Decision agreement across methods can be as low as 7%.

Abstract

The reliability of large language models (LLMs) is greatly compromised by their tendency to hallucinate, underscoring the need for precise identification of knowledge gaps within LLMs. Various methods for probing such gaps exist, ranging from calibration-based to prompting-based methods. To evaluate these probing methods, in this paper, we propose a new process based on using input variations and quantitative metrics. Through this, we expose two dimensions of inconsistency in knowledge gap probing. (1) Intra-method inconsistency: Minimal non-semantic perturbations in prompts lead to considerable variance in detected knowledge gaps within the same probing method; e.g., the simple variation of shuffling answer options can decrease agreement to around 40%. (2) Cross-method inconsistency: Probing methods contradict each other on whether a model knows the answer. Methods are highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks