Function Words as Statistical Cues for Language Learning
Xiulin Yang, Heidi Getz, Ethan Gotlieb Wilcox

TL;DR
This study investigates how the statistical distribution of function words supports language learning, demonstrating their universal properties and their role in neural models for acquiring grammatical structures.
Contribution
It provides a cross-linguistic analysis of function words' properties and shows their importance in neural language learning through counterfactual modeling and ablation experiments.
Findings
All three properties of function words are universal across 186 languages.
Preserving these properties in models facilitates language acquisition.
A Goldilocks effect exists where frequency and diversity balance is crucial.
Abstract
What statistical properties might support learning abstract grammatical knowledge from linear input? We address this question by examining the statistical distribution of function words. Function words have been argued to aid acquisition through three distributional properties: high frequency, reliable syntactic association, and phrase-boundary alignment. We conduct a cross-linguistic corpus analysis of 186 languages, which confirms that all three properties are universal. Using counterfactual language modeling and ablation experiments on English, we show that preserving these properties facilitates acquisition in neural learners, with a Goldilocks effect: function words must be frequent enough to be reliable, yet diverse enough to remain informative to structural dependency. Probing analyses further reveal that different learning conditions produce systematically different reliance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
