Intrinsic and Extrinsic Organized Attention: Softmax Invariance and Network Sparsity
Oluwadamilola Fasina, Ruben V.C. Pohle, Pei-Chun Su, Ronald R. Coifman

TL;DR
This paper investigates the structure of self-attention in transformers, demonstrating invariance properties, hierarchical organization, and sparsity, which enhance interpretability and enable applications like pruning and architecture comparison.
Contribution
It provides a theoretical proof of softmax invariance in self-attention and introduces hierarchical tensor organization for analyzing network structure and sparsity.
Findings
Self-attention invariance to softmax activation proven theoretically.
Hierarchical tensor organization reveals network structure and regularity.
Network sparsity analyzed via expansion coefficients, enabling pruning.
Abstract
We examine the intrinsic (within the attention head) and extrinsic (amongst the attention heads) structure of the self-attention mechanism in transformers. Theoretical evidence for invariance of the self-attention mechanism to softmax activation is obtained by appealing to paradifferential calculus, (and is supported by computational examples), which relies on the intrinsic organization of the attention heads. Furthermore, we use an existing methodology for hierarchical organization of tensors to examine network structure by constructing hierarchal partition trees with respect to the query, key, and head axes of network 3-tensors. Such an organization is consequential since it allows one to profitably execute common signal processing tasks on a geometry where the organized network 3-tensors exhibit regularity. We exemplify this qualitatively, by visualizing the hierarchical organization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence
MethodsPruning · Diffusion · Softmax
