BLiMP: The Benchmark of Linguistic Minimal Pairs for English
Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng,, Sheng-Fu Wang, Samuel R. Bowman

TL;DR
BLiMP is a comprehensive benchmark consisting of minimal pairs designed to evaluate English language models' understanding of core grammatical phenomena, revealing strengths in morphology but challenges in semantics and syntax.
Contribution
Introduces BLiMP, a large, automatically generated challenge set for systematically assessing language models' grasp of English grammar.
Findings
State-of-the-art models reliably identify morphological contrasts.
Models struggle with semantic restrictions like quantifiers and negative polarity.
Subtle syntactic phenomena such as extraction islands are challenging for models.
Abstract
We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing
