Implicit Representations of Grammaticality in Language Models
Yingshan Susan Wang, Linlu Qiu, Zhaofeng Wu, Roger P. Levy, Yoon Kim

TL;DR
This paper investigates whether pretrained language models implicitly learn a grammaticality distinction separate from their string probability scores, using a linear probe trained on grammaticality data.
Contribution
It demonstrates that LMs develop an implicit grammaticality representation that outperforms probability-based judgments on grammaticality benchmarks and generalizes across languages.
Findings
The grammaticality probe outperforms string probability in grammaticality judgments.
Probe scores correlate weakly with string probabilities.
The probe generalizes across multiple languages.
Abstract
Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality distinction distinct from string probability? We explore this question through studying internal representations of LMs, by training a linear probe on a dataset of grammatical and (synthetic) ungrammatical sentences obtained by applying perturbations to a naturalistic text corpus. We find that this simple grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
