Higher Order Linear Transformer

Jean Mercat

arXiv:2010.14816·cs.LG·October 29, 2020

Higher Order Linear Transformer

Jean Mercat

PDF

Open Access

TL;DR

This paper extends the linear transformer model by introducing a second-order approximation of the softmax normalization, aiming to improve efficiency while maintaining performance.

Contribution

It presents a novel second-order approximation method for the attention mechanism in linear transformers, building upon previous linearization techniques.

Findings

01

Achieves linear complexity in attention computation

02

Maintains comparable performance to standard transformers

03

Demonstrates improved efficiency in large-scale models

Abstract

Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Blind Source Separation Techniques · Face and Expression Recognition

MethodsSoftmax