Singular Vectors of Attention Heads Align with Features

Gabriel Franco; Carson Loughridge; Mark Crovella

arXiv:2602.13524·cs.LG·February 17, 2026

Singular Vectors of Attention Heads Align with Features

Gabriel Franco, Carson Loughridge, Mark Crovella

PDF

Open Access

TL;DR

This paper investigates when and why singular vectors of attention matrices in language models align with feature representations, providing theoretical justification and empirical evidence for their use in interpretability.

Contribution

It offers a theoretical framework and empirical validation for using singular vectors of attention matrices to identify features in language models.

Findings

01

Singular vectors align with features in observable models.

02

Theoretical conditions predict alignment in various scenarios.

03

Sparse attention decomposition indicates feature alignment in real models.

Abstract

Identifying feature representations in language models is a central task in mechanistic interpretability. Several recent studies have made an implicit assumption that feature representations can be inferred in some cases from singular vectors of attention matrices. However, sound justification for this assumption is lacking. In this paper we address that question, asking: why and when do singular vectors align with features? First, we demonstrate that singular vectors robustly align with features in a model where features can be directly observed. We then show theoretically that such alignment is expected under a range of conditions. We close by asking how, operationally, alignment may be recognized in real models where feature representations are not directly observable. We identify sparse attention decomposition as a testable prediction of alignment, and show evidence that it emerges…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling