On the Generalizability of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"

Asen Dotsinski; Udit Thakur; Marko Ivanov; Mohammad Hafeez Khan; Maria Heuss

arXiv:2506.22977·cs.CL·July 1, 2025

On the Generalizability of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals"

Asen Dotsinski, Udit Thakur, Marko Ivanov, Mohammad Hafeez Khan, Maria Heuss

PDF

Open Access

TL;DR

This reproduction study examines how different language models handle factual and counterfactual information, revealing that findings vary with model size, prompt structure, and domain, and questioning the effectiveness of certain interpretability methods.

Contribution

The paper extends prior work by testing the generalizability of mechanism competition findings across larger models, prompt variations, and specific domains, highlighting limitations and variability.

Findings

01

Reduced attention head specialization in larger models

02

Prompt structure significantly affects counterfactual prediction

03

Attention head ablation effectiveness varies by domain and model

Abstract

We present a reproduction study of "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals" (Ortu et al., 2024), which investigates competition of mechanisms in language models between factual recall and counterfactual in-context repetition. Our study successfully reproduces their primary findings regarding the localization of factual and counterfactual information, the dominance of attention blocks in mechanism competition, and the specialization of attention heads in handling competing information. We reproduce their results on both GPT-2 (Radford et al., 2019) and Pythia 6.9B (Biderman et al., 2023). We extend their work in three significant directions. First, we explore the generalizability of these findings to even larger models by replicating the experiments on Llama 3.1 8B (Grattafiori et al., 2024), discovering greatly reduced attention head…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Neurobiology of Language and Bilingualism · Text Readability and Simplification