TL;DR
This paper explores how neural language models acquire words, comparing their learning patterns to children, and identifies key factors influencing word learning across different model architectures and training stages.
Contribution
It provides a detailed analysis of word acquisition in neural models, highlighting differences from child language development and revealing consistent learning patterns across architectures.
Findings
Models rely heavily on word frequency early in training.
Longer utterances slow down word learning in models, similar to children.
Models follow consistent training patterns across architectures.
Abstract
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2007). Drawing on studies of word acquisition in children, we evaluate multiple predictors for words' ages of acquisition in LSTMs, BERT, and GPT-2. We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models, reinforcing the importance of interaction and sensorimotor experience in child language acquisition. Language models rely far more on word frequency than children, but like children, they exhibit slower learning of words in longer utterances. Interestingly, models follow consistent patterns during training for both unidirectional and bidirectional models, and for both LSTM and Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Tanh Activation · Cosine Annealing · Sigmoid Activation · Linear Warmup With Cosine Annealing · WordPiece · Position-Wise Feed-Forward Layer · Adam
