Derivational Morphology Reveals Analogical Generalization in Large   Language Models

Valentin Hofmann; Leonie Weissweiler; David Mortensen; Hinrich; Sch\"utze; Janet Pierrehumbert

arXiv:2411.07990·cs.CL·November 13, 2024

Derivational Morphology Reveals Analogical Generalization in Large Language Models

Valentin Hofmann, Leonie Weissweiler, David Mortensen, Hinrich, Sch\"utze, Janet Pierrehumbert

PDF

Open Access 10 Models

TL;DR

This study investigates whether large language models like GPT-J generalize linguistically through rules or analogical similarity, finding evidence that supports analogical processes, especially for variable morphological patterns.

Contribution

The paper introduces a novel method comparing rule-based and analogical models to explain LLM behavior, revealing the prominence of analogy in morphological generalization.

Findings

01

Analogical models better explain variable nominalization patterns.

02

GPT-J's behavior is influenced by word frequency, supporting analogy.

03

Rule-based explanations are insufficient for irregular morphological patterns.

Abstract

What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most studies analyzing the extent to which the language skills of LLMs resemble rules. As of yet, it is not known whether linguistic generalization in LLMs could equally well be explained as the result of analogical processes, which can be formalized as similarity operations on stored exemplars. A key shortcoming of prior research is its focus on linguistic phenomena with a high degree of regularity, for which rule-based and analogical approaches make the same predictions. Here, we instead examine derivational morphology, specifically English adjective nominalization, which displays notable variability. We introduce a new method for investigating linguistic generalization in LLMs: focusing on GPT-J, we fit cognitive models that instantiate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Language and cultural evolution

MethodsSparse Evolutionary Training · Focus