Multiplicative Position-aware Transformer Models for Language Understanding
Zhiheng Huang, Davis Liang, Peng Xu, Bing Xiang

TL;DR
This paper reviews existing position embedding methods in Transformer models, introduces a novel multiplicative approach that improves NLP task accuracy, and demonstrates its effectiveness on benchmark datasets.
Contribution
The paper systematically compares position embedding methods and proposes a new multiplicative method that enhances model performance on NLP benchmarks.
Findings
The multiplicative position embedding outperforms existing methods.
Replacing default embeddings with the new method improves RoBERTa models on SQuAD datasets.
Systematic analysis clarifies the impact of position embeddings on NLP accuracy.
Abstract
Transformer models, which leverage architectural improvements like self-attention, perform remarkably well on Natural Language Processing (NLP) tasks. The self-attention mechanism is position agnostic. In order to capture positional ordering information, various flavors of absolute and relative position embeddings have been proposed. However, there is no systematic analysis on their contributions and a comprehensive comparison of these methods is missing in the literature. In this paper, we review major existing position embedding methods and compare their accuracy on downstream NLP tasks, using our own implementations. We also propose a novel multiplicative embedding method which leads to superior accuracy when compared to existing methods. Finally, we show that our proposed embedding method, served as a drop-in replacement of the default absolute position embedding, can improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
