Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori, Michael C. Mozer, Asma Ghandeharioun

TL;DR
This paper investigates why large language models make contextualization errors, proposing the LLM Race Conditions Hypothesis that attributes these errors to dependency violations during token integration, supported by mechanistic interpretability evidence.
Contribution
It introduces the LLM Race Conditions Hypothesis to explain contextualization errors and provides causal evidence and potential interventions for these failures.
Findings
Dependencies between tokens affect contextualization accuracy
Mechanistic interpretability supports the hypothesis
Inference-time interventions can mitigate errors
Abstract
The profound success of transformer-based language models can largely be attributed to their ability to integrate relevant contextual information from an input sequence in order to generate a response or complete a task. However, we know very little about the algorithms that a model employs to implement this capability, nor do we understand their failure modes. For example, given the prompt "John is going fishing, so he walks over to the bank. Can he make an ATM transaction?", a model may incorrectly respond "Yes" if it has not properly contextualized "bank" as a geographical feature, rather than a financial institution. We propose the LLM Race Conditions Hypothesis as an explanation of contextualization errors of this form. This hypothesis identifies dependencies between tokens (e.g., "bank" must be properly contextualized before the final token, "?", integrates information from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling
