Attention-based PCA
Rodrigo Maulen-Soto (LPSM, SU), Claire Boyer (IUF)

TL;DR
This paper demonstrates that attention mechanisms inherently perform PCA-like computations, establishing a theoretical connection between attention and principal component analysis in unsupervised learning.
Contribution
It provides a rigorous analysis showing attention layers learn principal eigenvectors of data covariance, with convergence proofs and extensions to in-context settings.
Findings
Attention aligns with principal eigenvectors on Gaussian data.
Infinite-prompt limit converges to globally optimal PCA solutions.
Attention recovers signal directions in spiked Wishart models.
Abstract
We study attention mechanisms through the lens of a canonical unsupervised problem: principal component analysis (PCA). We show that, when trained on Gaussian data, both softmax and linear attention layers learn parameters that align with the principal eigenvectors of the covariance matrix, thereby establishing a direct and explicit connection with PCA. Our analysis covers both finite and infinite prompt regimes. In the infinite-prompt limit, we prove convergence to globally optimal solutions aligned with the leading spectral direction, while in the finiteprompt setting we show that the same behavior emerges up to sampling effects. We further extend the analysis to an in-context setting with spiked Wishart covariances, where attention successfully recovers the underlying signal direction. These results demonstrate that attention inherently performs PCA-like computations under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
