Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
J. Clayton Kerce

TL;DR
This paper introduces architectural modifications and supervision techniques in transformers to reveal and control hidden modularity, enabling predictable and causal influence over specific behaviors.
Contribution
It presents a novel approach combining dual-stream processing, per-layer supervision, and gated attention to expose and leverage modularity in transformer models.
Findings
Ablation effects are significantly larger with per-layer supervision.
Models with per-layer supervision show four times greater control over targeted behaviors.
The approach reveals widespread modularity, enabling causal control over model components.
Abstract
Transformers resist surgical control. Ablating an attention head identified as critical for capitalization produces minimal behavioral change because distributed redundancy compensates for damage. This Hydra effect renders interpretability illusory: we may identify components through correlation, but cannot predict or control their causal role. We demonstrate that architectural interventions can expose hidden modularity. Our approach combines dual-stream processing separating token and contextual representations, per-layer supervision providing independent gradient signal at each depth, and gated attention regularizing toward discrete activation patterns. When trained with per-layer supervision, models produce ablation effects 5 to 23 times larger than architecturally identical controls trained with standard objectives. This enables 4 times greater control leverage on targeted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Embodied and Extended Cognition · Neural Networks and Reservoir Computing
