TL;DR
This paper uncovers a structured exception handler in GPT-2 Small's final layer, revealing how neurons coordinate to route knowledge and influence predictions, challenging the idea of static knowledge storage.
Contribution
It identifies a detailed exception handling architecture within GPT-2 Small, showing how neurons coordinate to route information rather than store facts, with implications for interpretability.
Findings
Neurons form a three-tier exception handler in GPT-2 Small's final layer.
Knowledge neurons act as routing infrastructure, amplifying signals from residual streams.
The architecture operates at token-level predictability, not syntactic structure.
Abstract
The final MLP of GPT-2 Small exhibits a fully legible routing program -- 27 named neurons organized into a three-tier exception handler -- while the knowledge it routes remains entangled across ~3,040 residual neurons. We decompose all 3,072 neurons (to numerical precision) into: 5 fused Core neurons that reset vocabulary toward function words, 10 Differentiators that suppress wrong candidates, 5 Specialists that detect structural boundaries, and 7 Consensus neurons that each monitor a distinct linguistic dimension. The consensus-exception crossover -- where MLP intervention shifts from helpful to harmful -- is statistically sharp (bootstrap 95% CIs exclude zero at all consensus levels; crossover between 4/7 and 5/7). Three experiments show that "knowledge neurons" (Dai et al., 2022), at L11 of this model, function as routing infrastructure rather than fact storage: the MLP amplifies or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
