Assessing the Ability of Self-Attention Networks to Learn Word Order
Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu

TL;DR
This paper investigates how well self-attention networks (SAN) learn word order information, revealing that SANs struggle with positional learning unless trained on specific tasks like machine translation, where they perform better than RNNs.
Contribution
The study introduces a novel word reordering detection task to empirically evaluate the positional learning capabilities of SANs versus RNNs.
Findings
SAN trained on reordering detection struggles with positional info
SAN trained on machine translation learns better word order info
Position embedding is crucial for learning word order in SANs
Abstract
Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when "lacking positional information" have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
