Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion

Denitsa Saynova; Lovisa Hagstr\"om; Moa Johansson; Richard Johansson; Marco Kuhlmann

arXiv:2410.14405·cs.CL·July 2, 2025

Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion

Denitsa Saynova, Lovisa Hagstr\"om, Moa Johansson, Richard Johansson, Marco Kuhlmann

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a new dataset and interpretability methods to distinguish how language models predict facts, revealing different internal processes for recall, guesswork, and heuristics.

Contribution

It presents PrISM, a dataset construction recipe, and applies interpretability techniques to differentiate prediction scenarios in language models.

Findings

01

Exact fact recall involves mid-range MLP layers.

02

Guesswork and heuristics rely on late token MLP layers.

03

Results confirm the importance of specific MLP layers for fact recall.

Abstract

Language models (LMs) can make a correct prediction based on many possible signals in a prompt, not all corresponding to recall of factual associations. However, current interpretations of LMs fail to take this into account. For example, given the query "Astrid Lindgren was born in" with the corresponding completion "Sweden", no difference is made between whether the prediction was based on knowing where the author was born or assuming that a person with a Swedish-sounding name was born in Sweden. In this paper, we present a model-specific recipe - PrISM - for constructing datasets with examples of four different prediction scenarios: generic language modeling, guesswork, heuristics recall and exact fact recall. We apply two popular interpretability methods to the scenarios: causal tracing (CT) and information flow analysis. We find that both yield distinct results for each scenario.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

copenlu/cub-counterfact
dataset· 175 dl
175 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsSparse Evolutionary Training