Differences in Typological Alignment in Language Models' Treatment of Differential Argument Marking
Iskar Deng, Nathalia Xu, Shane Steinert-Threlkeld

TL;DR
This study investigates how GPT-2 models trained on synthetic data exhibit typological preferences in differential argument marking, revealing differences in modeling natural markedness direction versus object preference.
Contribution
It extends the analysis of typological preferences in language models to differential argument marking, highlighting distinct modeling behaviors for different typological dimensions.
Findings
Models prefer natural markedness direction, marking semantically atypical arguments.
Models do not replicate the human-like object preference in DAM.
Typological tendencies may stem from different underlying sources.
Abstract
Recent work has shown that language models (LMs) trained on synthetic corpora can exhibit typological preferences that resemble cross-linguistic regularities in human languages, particularly for syntactic phenomena such as word order. In this paper, we extend this paradigm to differential argument marking (DAM), a semantic licensing system in which morphological marking depends on semantic prominence. Using a controlled synthetic learning method, we train GPT-2 models on 18 corpora implementing distinct DAM systems and evaluate their generalization using minimal pairs. Our results reveal a dissociation between two typological dimensions of DAM. Models reliably exhibit human-like preferences for natural markedness direction, favoring systems in which overt marking targets semantically atypical arguments. In contrast, models do not reproduce the strong object preference in human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Language and cultural evolution · Natural Language Processing Techniques
