Out of Order: How Important Is The Sequential Order of Words in a Sentence in Natural Language Understanding Tasks?
Thang M. Pham, Trung Bui, Long Mai, Anh Nguyen

TL;DR
This paper investigates the importance of word order in natural language understanding models, revealing that many models rely on superficial cues and are largely insensitive to word order changes, questioning the true understanding of these models.
Contribution
The study demonstrates that BERT-based models often ignore word order and rely on superficial cues, highlighting the need for better training methods to capture genuine sentence meaning.
Findings
75-90% of correct predictions remain after word shuffling
BERT embeddings are contextually rich but insensitive to word order changes
Superficial cues enable correct predictions despite word order disruptions
Abstract
Do state-of-the-art natural language understanding models care about word order - one of the most important characteristics of a sequence? Not always! We found 75% to 90% of the correct predictions of BERT-based classifiers, trained on many GLUE tasks, remain constant after input words are randomly shuffled. Despite BERT embeddings are famously contextual, the contribution of each individual word to downstream tasks is almost unchanged even after the word's context is shuffled. BERT-based models are able to exploit superficial cues (e.g. the sentiment of keywords in sentiment analysis; or the word-wise similarity between sequence-pair inputs in natural language inference) to make correct decisions when tokens are arranged in random orders. Encouraging classifiers to capture word order information improves the performance on most GLUE tasks, SQuAD 2.0 and out-of-samples. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dropout · Softmax · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · Attention Is All You Need · Layer Normalization · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia?
