TL;DR
This paper identifies and manipulates sparse, entity-specific neurons in language models, demonstrating their causal role in factual recall and revealing insights into how models encode entity knowledge.
Contribution
It introduces a method to localize entity cells in language models and provides causal evidence of their role in factual recall, highlighting their stability and surface form invariance.
Findings
Localized neurons cluster in early layers across models.
Suppressing a neuron erases recall for its entity; activating it recovers knowledge.
Entity cells encode canonical identity, not surface tokens, and are stable across tuning.
Abstract
How do language models retrieve entity-specific facts from their parameters? We investigate this question by searching for sparse, entity-selective MLP neurons - which we call entity cells, by analogy to the "grandmother cell" hypothesis in neuroscience - and testing whether they play a causal role in factual recall. We localize candidate entity cells by ranking MLP neurons for activation consistency across varied prompts about the same entity, applying this procedure across seven models on a curated subset of PopQA. In all models, localized neurons cluster predominantly in early layers, an empirical pattern not imposed by the architecture. Using Qwen2.5-7B base as a model organism, we find the clearest causal evidence: suppressing a localized cell selectively erases recall for its matched entity while leaving others intact, and activating a single cell is sufficient to recover correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
