Derivational Morphology Reveals Analogical Generalization in Large Language Models
Valentin Hofmann, Leonie Weissweiler, David Mortensen, Hinrich, Sch\"utze, Janet Pierrehumbert

TL;DR
This study investigates whether large language models like GPT-J generalize linguistically through rules or analogical similarity, finding evidence that supports analogical processes, especially for variable morphological patterns.
Contribution
The paper introduces a novel method comparing rule-based and analogical models to explain LLM behavior, revealing the prominence of analogy in morphological generalization.
Findings
Analogical models better explain variable nominalization patterns.
GPT-J's behavior is influenced by word frequency, supporting analogy.
Rule-based explanations are insufficient for irregular morphological patterns.
Abstract
What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most studies analyzing the extent to which the language skills of LLMs resemble rules. As of yet, it is not known whether linguistic generalization in LLMs could equally well be explained as the result of analogical processes, which can be formalized as similarity operations on stored exemplars. A key shortcoming of prior research is its focus on linguistic phenomena with a high degree of regularity, for which rule-based and analogical approaches make the same predictions. Here, we instead examine derivational morphology, specifically English adjective nominalization, which displays notable variability. We introduce a new method for investigating linguistic generalization in LLMs: focusing on GPT-J, we fit cognitive models that instantiate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗BabyLM-community/babylm-baseline-100m-gpt-bert-causal-focusmodel· 7.9k dl· ♡ 27.9k dl♡ 2
- 🤗BabyLM-community/babylm-baseline-100m-gpt-bert-mixedmodel· 7.7k dl7.7k dl
- 🤗BabyLM-community/babylm-baseline-10m-gpt-bert-mixedmodel· 7.7k dl7.7k dl
- 🤗BabyLM-community/babylm-baseline-10m-gpt-bert-causal-focusmodel· 7.9k dl7.9k dl
- 🤗BabyLM-community/babylm-baseline-10m-gpt-bert-masked-focusmodel· 7.9k dl7.9k dl
- 🤗BabyLM-community/babylm-baseline-100m-gpt-bert-masked-focusmodel· 7.5k dl7.5k dl
- 🤗BabyLM-community/babylm-interaction-baseline-simpomodel· 2.1k dl· ♡ 22.1k dl♡ 2
- 🤗BabyLM-community/babylm-baseline-100m-gpt2model· 7.7k dl7.7k dl
- 🤗BabyLM-community/babylm-baseline-10m-gpt2model· 5.3k dl5.3k dl
- 🤗llm-slice/blm-gpt2s-90M-s42model· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution
MethodsSparse Evolutionary Training · Focus
