Word Order Matters when you Increase Masking
Karim Lasri, Alessandro Lenci, Thierry Poibeau

TL;DR
This paper investigates how the importance of position encoding in Transformer models varies with the level of masking during training, revealing that more masking increases the need for explicit position information.
Contribution
It demonstrates that position encoding becomes crucial as masking increases, and models without position encoding struggle to recover positional information under high masking conditions.
Findings
Position encoding importance rises with masking level
Models without position encoding cannot reconstruct positional info under high masking
Masked language models rely on position encoding more as masking increases
Abstract
Word order, an essential property of natural languages, is injected in Transformer-based neural language models using position encoding. However, recent experiments have shown that explicit position encoding is not always useful, since some models without such feature managed to achieve state-of-the art performance on some tasks. To understand better this phenomenon, we examine the effect of removing position encodings on the pre-training objective itself (i.e., masked language modelling), to test whether models can reconstruct position information from co-occurrences alone. We do so by controlling the amount of masked tokens in the input sentence, as a proxy to affect the importance of position information for the task. We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
MethodsTest
