Unveiling the Hidden Structure of Self-Attention via Kernel Principal   Component Analysis

Rachel S.Y. Teo; Tan M. Nguyen

arXiv:2406.13762·cs.LG·November 1, 2024·1 cites

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

Rachel S.Y. Teo, Tan M. Nguyen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reveals the underlying structure of self-attention in transformers through kernel PCA, providing a theoretical foundation and introducing a robust attention variant that improves performance across multiple tasks.

Contribution

It derives self-attention from kernel PCA, formulates the value matrix explicitly, and proposes RPC-Attention, a robust attention mechanism resilient to data contamination.

Findings

01

Kernel PCA explains self-attention's projection onto principal components.

02

RPC-Attention outperforms softmax attention on multiple benchmarks.

03

Theoretical and empirical validation of the kernel PCA framework.

Abstract

The remarkable success of transformers in sequence modeling tasks, spanning various applications in natural language processing and computer vision, is attributed to the critical role of self-attention. Similar to the development of most deep learning models, the construction of these attention mechanisms relies on heuristics and experience. In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space. We then formulate the exact formula for the value matrix in self-attention, theoretically and empirically demonstrating that this value matrix captures the eigenvectors of the Gram matrix of the key vectors in self-attention. Leveraging our kernel PCA framework, we propose Attention with Robust Principal Components (RPC-Attention), a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rachtsy/kpca_code
pytorchOfficial

Videos

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsAttention Is All You Need · Principal Components Analysis · Softmax