Multiplicative Position-aware Transformer Models for Language   Understanding

Zhiheng Huang; Davis Liang; Peng Xu; Bing Xiang

arXiv:2109.12788·cs.CL·September 28, 2021·1 cites

Multiplicative Position-aware Transformer Models for Language Understanding

Zhiheng Huang, Davis Liang, Peng Xu, Bing Xiang

PDF

Open Access

TL;DR

This paper reviews existing position embedding methods in Transformer models, introduces a novel multiplicative approach that improves NLP task accuracy, and demonstrates its effectiveness on benchmark datasets.

Contribution

The paper systematically compares position embedding methods and proposes a new multiplicative method that enhances model performance on NLP benchmarks.

Findings

01

The multiplicative position embedding outperforms existing methods.

02

Replacing default embeddings with the new method improves RoBERTa models on SQuAD datasets.

03

Systematic analysis clarifies the impact of position embeddings on NLP accuracy.

Abstract

Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional ordering information, various flavors of absolute and relative position embeddings have been proposed. However, there is no systematic analysis on their contributions and a comprehensive comparison of these methods is missing in the literature. In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks, using our own implementations. We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods. Finally, we show that our proposed embedding method, served as a drop-in replacement of the default absolute position embedding, can improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems