Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models
Dante Campregher, Yanxu Chen, Sander Hoffman, Maria Heuss

TL;DR
This study critically examines how large language models handle conflicting factual information, revealing that attention heads promote facts through general suppression mechanisms and exhibit domain-dependent behaviors, challenging prior assumptions.
Contribution
It provides a detailed mechanistic analysis of attention heads in LLMs, clarifying their role in factual competition and domain specificity, and reconciles conflicting prior findings.
Findings
Attention heads promote facts via general copy suppression.
Attention head behavior varies across domains.
Larger models show more specialized attention patterns.
Abstract
This paper presents a reproducibility study examining how Large Language Models (LLMs) manage competing factual and counterfactual information, focusing on the role of attention heads in this process. We attempt to reproduce and reconcile findings from three recent studies by Ortu et al., Yu, Merullo, and Pavlick and McDougall et al. that investigate the competition between model-learned facts and contradictory context information through Mechanistic Interpretability tools. Our study specifically examines the relationship between attention head strength and factual output ratios, evaluates competing hypotheses about attention heads' suppression mechanisms, and investigates the domain specificity of these attention patterns. Our findings suggest that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
