Self-Attention with Relative Position Representations

Peter Shaw; Jakob Uszkoreit; Ashish Vaswani

arXiv:1803.02155·cs.CL·April 16, 2018

Self-Attention with Relative Position Representations

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani

PDF

5 Repos 1 Models 1 Video

TL;DR

This paper enhances the Transformer model by incorporating relative position representations into self-attention, leading to improved translation performance without increasing model complexity.

Contribution

It introduces an efficient method for integrating relative position information into self-attention, outperforming absolute position methods in translation tasks.

Findings

01

Improved BLEU scores on WMT translation tasks

02

Relative position representations outperform absolute ones

03

Combining both types does not further improve results

Abstract

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Pendrokar/xvapitch
model· ♡ 2
♡ 2

Videos

Self-Attention with Relative Position Representations – Paper explained· youtube

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Relative Position Encodings · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam