Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

Dominik Dahlem; Diego Maniloff; Mac Misiura

arXiv:2605.04893·cs.LG·May 7, 2026

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

Dominik Dahlem, Diego Maniloff, Mac Misiura

PDF

TL;DR

This paper investigates the limitations of spectral diagnostics in analyzing attention mechanisms in large language models, revealing orientation-blindness and proposing a two-axis diagnostic for better failure mode detection.

Contribution

It proves that symmetric spectral diagnostics cannot detect information flow direction and introduces a polarity prediction framework for attention failure modes.

Findings

01

Spectral diagnostics are orientation-blind and cannot distinguish operator transpose.

02

Transport capacity has a lower bound of 1/5, with window attention surpassing this floor.

03

Transport features retain interpretability up to 8B parameters, with polarity reversing as predicted.

Abstract

Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.