
TL;DR
This paper extends the linear transformer model by introducing a second-order approximation of the softmax normalization, aiming to improve efficiency while maintaining performance.
Contribution
It presents a novel second-order approximation method for the attention mechanism in linear transformers, building upon previous linearization techniques.
Findings
Achieves linear complexity in attention computation
Maintains comparable performance to standard transformers
Demonstrates improved efficiency in large-scale models
Abstract
Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Blind Source Separation Techniques · Face and Expression Recognition
MethodsSoftmax
