Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries
Chao Ma, Lexing Ying

TL;DR
This paper demonstrates that self-attention mechanisms naturally arise from symmetry considerations, specifically orthogonal equivariance, making them well-suited for sequence-to-sequence problems involving knowledge integration.
Contribution
The paper provides a theoretical perspective showing that self-attention structures are a natural consequence of symmetry properties in seq2seq functions with knowledge.
Findings
Self-attention is derived from orthogonal equivariance principles.
Seq2seq functions with knowledge naturally take a form similar to self-attention.
Symmetry considerations justify the use of self-attention in language processing tasks.
Abstract
In this paper, we show that structures similar to self-attention are natural to learn many sequence-to-sequence problems from the perspective of symmetry. Inspired by language processing applications, we study the orthogonal equivariance of seq2seq functions with knowledge, which are functions taking two inputs -- an input sequence and a ``knowledge'' -- and outputting another sequence. The knowledge consists of a set of vectors in the same embedding space as the input sequence, containing the information of the language used to process the input sequence. We show that orthogonal equivariance in the embedding space is natural for seq2seq functions with knowledge, and under such equivariance the function must take the form close to the self-attention. This shows that network structures similar to self-attention are the right structures to represent the target function of many seq2seq…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · RNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
