Darkness Visible: Reading the Exception Handler of a Language Model

Peter Balogh

arXiv:2604.04756·cs.LG·April 8, 2026

Darkness Visible: Reading the Exception Handler of a Language Model

Peter Balogh

PDF

1 Repo

TL;DR

This paper uncovers a structured exception handler in GPT-2 Small's final layer, revealing how neurons coordinate to route knowledge and influence predictions, challenging the idea of static knowledge storage.

Contribution

It identifies a detailed exception handling architecture within GPT-2 Small, showing how neurons coordinate to route information rather than store facts, with implications for interpretability.

Findings

01

Neurons form a three-tier exception handler in GPT-2 Small's final layer.

02

Knowledge neurons act as routing infrastructure, amplifying signals from residual streams.

03

The architecture operates at token-level predictability, not syntactic structure.

Abstract

The final MLP of GPT-2 Small exhibits a fully legible routing program -- 27 named neurons organized into a three-tier exception handler -- while the knowledge it routes remains entangled across ~3,040 residual neurons. We decompose all 3,072 neurons (to numerical precision) into: 5 fused Core neurons that reset vocabulary toward function words, 10 Differentiators that suppress wrong candidates, 5 Specialists that detect structural boundaries, and 7 Consensus neurons that each monitor a distinct linguistic dimension. The consensus-exception crossover -- where MLP intervention shifts from helpful to harmful -- is statistically sharp (bootstrap 95% CIs exclude zero at all consensus levels; crossover between 4/7 and 5/7). Three experiments show that "knowledge neurons" (Dai et al., 2022), at L11 of this model, function as routing infrastructure rather than fact storage: the MLP amplifies or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pbalogh/transparent-gpt2
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.