The Routing and Filtering Structure of Attention

Shafayeth Jamil; Rehan Kapadia

arXiv:2605.18826·cs.LG·May 20, 2026

The Routing and Filtering Structure of Attention

Shafayeth Jamil, Rehan Kapadia

PDF

TL;DR

This paper analyzes the structure of attention in transformers, decomposing it into routing and filtering components, and introduces a diagnostic parameterization to improve interpretability and efficiency.

Contribution

It introduces $S$-$D$ attention to disentangle routing from filtering, revealing spectral cascades and enabling simplified, efficient attention mechanisms.

Findings

01

Routing operates at low rank, below the allocated capacity.

02

Linearizing early layers of $S$-$D$ attention costs less than 5% perplexity.

03

Cascade architectures reduce attention parameters significantly with minimal perplexity increase.

Abstract

The attention interaction matrix $Q K^{⊤}$ contains two entangled computations: a skew-symmetric component that redistributes information between positions (routing) and a symmetric component that scales mutual relevance (filtering). We decompose 1776 heads across five pretrained transformers and find routing operating at low rank, well below the routing capacity allocated by the weight kernel. We introduce $S$ - $D$ attention as a diagnostic parameterization that disentangles routing from filtering by construction with guaranteed stability ( $Re (λ) \leq 0$ ) and trains stably without layer normalization. When disentangled and unnormalized, routing self-organizes into a spectral cascade, effective rank $2$ at the first layer, expanding with depth across six scales from 7M to 355M parameters. The cascade predicts where attention can be simplified: linearizing the first seven…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.