Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning

Junxuan Wang; Xuyang Ge; Wentao Shu; Zhengfu He; Xipeng Qiu

arXiv:2508.16929·cs.LG·February 12, 2026

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning

Junxuan Wang, Xuyang Ge, Wentao Shu, Zhengfu He, Xipeng Qiu

PDF

TL;DR

This paper reveals that transformer attention outputs are confined to a low-dimensional subspace, which impacts sparse dictionary learning, and proposes a subspace-constrained training method to mitigate dead features.

Contribution

The study uncovers the low-rank structure of attention outputs and introduces a subspace-constrained training approach to improve sparse autoencoders.

Findings

01

Attention outputs have about 60% effective dimensionality.

02

Low-rank structure causes dead feature problems in sparse learning.

03

Subspace-constrained training reduces dead features from 87% to below 1%.

Abstract

Transformer architectures, and their attention mechanisms in particular, form the foundation of modern large language models. While transformer models are widely believed to operate in high-dimensional hidden spaces, we show that attention outputs are in fact confined to a surprisingly low-dimensional subspace, with an effective dimensionality of only about $60%$ of the full space. In contrast, MLP outputs and residual streams remain much closer to full-rank, exhibiting effective ranks around $90%$ . This striking dimensional discrepancy is consistently observed across diverse model families and datasets, and is strongly shaped by the attention output projection matrix. Critically, we find this low-rank structure as a key factor of the prevalent dead feature problem in sparse dictionary learning, where it creates a mismatch between randomly initialized features and the intrinsic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.