Probing for targeted syntactic knowledge through grammatical error detection
Christopher Davis, Christopher Bryant, Andrew Caines, Marek Rei, Paula, Buttery

TL;DR
This paper investigates whether pre-trained language models can reliably detect subject-verb agreement errors, revealing that while some models encode relevant syntactic info, their performance varies across training data and constructions.
Contribution
The study introduces grammatical error detection as a diagnostic tool to assess syntactic knowledge in language models, highlighting limitations in robustness and consistency.
Findings
Masked language models encode SVA information linearly.
Autoregressive models perform similarly to baseline.
Performance varies with training data and syntactic constructions.
Abstract
Targeted studies testing knowledge of subject-verb agreement (SVA) indicate that pre-trained language models encode syntactic information. We assert that if models robustly encode subject-verb agreement, they should be able to identify when agreement is correct and when it is incorrect. To that end, we propose grammatical error detection as a diagnostic probe to evaluate token-level contextual representations for their knowledge of SVA. We evaluate contextual representations at each layer from five pre-trained English language models: BERT, XLNet, GPT-2, RoBERTa, and ELECTRA. We leverage public annotated training data from both English second language learners and Wikipedia edits, and report results on manually crafted stimuli for subject-verb agreement. We find that masked language models linearly encode information relevant to the detection of SVA errors, while the autoregressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Byte Pair Encoding · SentencePiece · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Linear Warmup With Linear Decay · Attention Dropout
