The Discrete Charm of the MLP: Binary Routing of Continuous Signals in Transformer Feed-Forward Layers
Peter Balogh

TL;DR
This paper reveals that transformer MLP layers perform binary routing of continuous signals, with neurons acting as consensus switches that determine whether tokens require nonlinear processing, explaining the limitations of polynomial approximations.
Contribution
It uncovers the binary routing mechanism in transformer MLP layers and characterizes its developmental stages and functional importance, providing a new perspective on neural computation.
Findings
Binary neuron activations effectively route signals without information loss.
Removing consensus neurons significantly increases perplexity, confirming their functional role.
Binary routing explains the failure of polynomial approximations in nonlinear layers.
Abstract
We show that MLP layers in transformer language models perform binary routing of continuous signals: the decision of whether a token needs nonlinear processing is well-captured by binary neuron activations, even though the signals being routed are continuous. In GPT-2 Small (124M parameters), we find that specific neurons implement a consensus architecture -- seven "default-ON" neurons and one exception handler (N2123 in Layer 11) that are 93-98% mutually exclusive -- creating a binary routing switch. A cross-layer analysis reveals a developmental arc: early layers (L1-3) use single gateway neurons to route exceptions without consensus quorums; middle layers (L4-6) show diffuse processing with neither gateway nor consensus; and late layers (L7-11) crystallize full consensus/exception architectures with increasing quorum size (1 to 3 to 7 consensus neurons). Causal validation confirms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Neural dynamics and brain function · Advanced Memory and Neural Computing
