Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs
Kanishka Misra, Kyle Mahowald

TL;DR
This study demonstrates that language models can learn rare grammatical phenomena like the AANN construction through generalization from more common related constructions, rather than solely memorization.
Contribution
The paper provides evidence that transformer language models can acquire rare syntactic phenomena via generalization, challenging the view that such learning is mainly memorization.
Findings
AANNs are learned better than perturbed variants.
Learning is enhanced with increased input variability.
Generalization from related constructions explains rare phenomenon acquisition.
Abstract
Language models learn rare syntactic phenomena, but the extent to which this is attributable to generalization vs. memorization is a major open question. To that end, we iteratively trained transformer language models on systematically manipulated corpora which were human-scale in size, and then evaluated their learning of a rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (``a beautiful five days''). We compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which AANN sentences were removed. We found that AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g., ``a few days''). An additional experiment showed that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
