Word Order Does Matter (And Shuffled Language Models Know It)
Vinit Ravishankar, Mostafa Abdou, Artur Kulmizev, Anders S{\o}gaard

TL;DR
This paper investigates whether language models trained on shuffled text retain word order information, revealing that they do, due to subtle implementation details and statistical dependencies, and that many tasks still require genuine word order understanding.
Contribution
The study uncovers that shuffled-trained language models retain word order information and highlights the importance of implementation details and statistical dependencies in this phenomenon.
Findings
Models trained on shuffled text retain word order information.
Word order information is partly due to shuffling implementation details.
Many language understanding tasks require genuine word order knowledge.
Abstract
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE, putting into question the importance of word order information. Somewhat counter-intuitively, some of these studies also report that position embeddings appear to be crucial for models' good performance with shuffled text. We probe these language models for word order information and investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order. We show this is in part due to a subtlety in how shuffling is implemented in previous work -- before rather than after subword segmentation. Surprisingly, we find even Language models trained on text shuffled after subword segmentation retain some semblance of information about word order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
