Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Denitsa Saynova, Lovisa Hagstr\"om, Moa Johansson, Richard Johansson, Marco Kuhlmann

TL;DR
This paper introduces a new dataset and interpretability methods to distinguish how language models predict facts, revealing different internal processes for recall, guesswork, and heuristics.
Contribution
It presents PrISM, a dataset construction recipe, and applies interpretability techniques to differentiate prediction scenarios in language models.
Findings
Exact fact recall involves mid-range MLP layers.
Guesswork and heuristics rely on late token MLP layers.
Results confirm the importance of specific MLP layers for fact recall.
Abstract
Language models (LMs) can make a correct prediction based on many possible signals in a prompt, not all corresponding to recall of factual associations. However, current interpretations of LMs fail to take this into account. For example, given the query "Astrid Lindgren was born in" with the corresponding completion "Sweden", no difference is made between whether the prediction was based on knowing where the author was born or assuming that a person with a Swedish-sounding name was born in Sweden. In this paper, we present a model-specific recipe - PrISM - for constructing datasets with examples of four different prediction scenarios: generic language modeling, guesswork, heuristics recall and exact fact recall. We apply two popular interpretability methods to the scenarios: causal tracing (CT) and information flow analysis. We find that both yield distinct results for each scenario.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsSparse Evolutionary Training
