TL;DR
This paper explores how language models understand and categorize linguistic environments based on semantic monotonicity, particularly in negative polarity item licensing, revealing their internal semantic generalizations.
Contribution
It introduces a novel experimental pipeline combining diagnostic classifiers and ranking methods to analyze LMs' semantic categories and their parallels to human language understanding.
Findings
Language models form categories based on semantic monotonicity.
Models show similar patterns to humans in NPI licensing.
Semantic generalizations are reflected in model internal representations.
Abstract
We investigate the semantic knowledge of language models (LMs), focusing on (1) whether these LMs create categories of linguistic environments based on their semantic monotonicity properties, and (2) whether these categories play a similar role in LMs as in human language understanding, using negative polarity item licensing as a case study. We introduce a series of experiments consisting of probing with diagnostic classifiers (DCs), linguistic acceptability tasks, as well as a novel DC ranking method that tightly connects the probing results to the inner workings of the LM. By applying our experimental pipeline to LMs trained on various filtered corpora, we are able to gain stronger insights into the semantic generalizations that are acquired by these models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
