Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
Bilal Chughtai, Alan Cooney, Neel Nanda

TL;DR
This paper investigates how large language models recall factual information, revealing that multiple independent mechanisms additively contribute to correct answers, and introduces methods to analyze these mechanisms and attention heads.
Contribution
It uncovers the additive mechanisms behind factual recall in LLMs and extends attribution techniques to better understand attention head contributions.
Findings
Factual recall involves multiple independent additive mechanisms.
Mechanisms interfere constructively to produce correct answers.
Extended attribution methods reveal mixed attention heads from different source tokens.
Abstract
How do transformer-based large language models (LLMs) store and retrieve knowledge? We focus on the most basic form of this task -- factual recall, where the model is tasked with explicitly surfacing stored facts in prompts of form `Fact: The Colosseum is in the country of'. We find that the mechanistic story behind factual recall is more complex than previously thought. It comprises several distinct, independent, and qualitatively different mechanisms that additively combine, constructively interfering on the correct attribute. We term this generic phenomena the additive motif: models compute through summing up multiple independent contributions. Each mechanism's contribution may be insufficient alone, but summing results in constructive interfere on the correct answer. In addition, we extend the method of direct logit attribution to attribute an attention head's output to individual…
Peer Reviews
Decision·Submitted to ICLR 2024
- The paper uses established mechanistic interpretation tools and extends them to identify mechanisms in the transformer that perform very specific purposes - The SUBJECT-head, RELATION-head, and MLP additive behaviors are established by showing consistent patterns across a range of fact queries
- The paper introduction and further discussions claim that the results reported here provide a mechanistic explanation for the limitations of LLMs to learn "B is A" from training on "A is B" [1]. However, I do not see sufficient evidence to support this claim - They have shown that in the forward direction the transformer selectively promotes attributes relevant to the subject and the relation - This does not show that the transformer CANNOT/DOES NOT perform the same operations in the r
(1) Based on sufficient experimental results verification, the author has identified and explained the internal mechanisms of LLMs at the granularity level of attention heads and MLPs. More interestingly, it provides an explanation of the “reversal curse” phenomenon discovered in recent works. (2) This work has thoroughly discussed the related work and proposed a range of possible directions for future works.
(1) There have been many works [1, 2] interpreting the model behavior of Factual Recall. It seems that the novelty is insufficient with only a deeper zooming into attention heads using similar interpretability methods. Additionally, the discovery of the additive motif is not surprising enough, as already explained in work [3] that "Attention heads can be understood as independent operations, each outputting a result which is added into the residual stream." (2) Is direct logit attribution (DLA)
The study tackles an important/interesting problem and the paper reports a substantial amount of experimentation.
Although I believe the idea is interesting, and there may be some valuable finding in the paper, I have difficulties seeing a clear take-home message based on the results presented, and probably also due to the way they are presented. I have some concrete points of criticism listed in the comments below (with approximate order of importance). - The main claim, additivity of the multiple mechanisms, is not very clearly demonstrated in the paper. The separation of the subject/relation heads (
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Law, Economics, and Judicial Systems
MethodsFocus
