Beyond Nystr\"omformer -- Approximation of self-attention by Spectral Shifting
Madhusudan Verma

TL;DR
This paper introduces a spectral shifting method to approximate self-attention in transformers, offering a more accurate alternative to Nystr"om-based methods with similar linear time complexity.
Contribution
The paper proposes a spectral shifting technique for self-attention approximation that improves accuracy over Nystr"om methods while maintaining linear time complexity.
Findings
Spectral shifting provides a stronger error bound than Nystr"om approximation.
The proposed method achieves similar $O(n)$ time complexity as Nystr"omformer.
Experimental results demonstrate improved approximation accuracy.
Abstract
Transformer is a powerful tool for many natural language tasks which is based on self-attention, a mechanism that encodes the dependence of other tokens on each specific token, but the computation of self-attention is a bottleneck due to its quadratic time complexity. There are various approaches to reduce the time complexity and approximation of matrix is one such. In Nystr\"omformer, the authors used Nystr\"om based method for approximation of softmax. The Nystr\"om method generates a fast approximation to any large-scale symmetric positive semidefinite (SPSD) matrix using only a few columns of the SPSD matrix. However, since the Nystr\"om approximation is low-rank when the spectrum of the SPSD matrix decays slowly, the Nystr\"om approximation is of low accuracy. Here an alternative method is proposed for approximation which has a much stronger error bound than the Nystr\"om method.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
