BabyLM's First Constructions: Causal probing provides a signal of learning
Joshua Rozner, Leonie Weissweiler, Cory Shain

TL;DR
This paper investigates whether masked language models trained on developmentally plausible data can learn linguistic constructions, showing they do learn diverse constructions and that better constructional understanding correlates with improved benchmark performance.
Contribution
It demonstrates that models trained on realistic data quantities can acquire diverse constructions, supporting the construction grammar hypothesis in a developmentally plausible context.
Findings
Models learn diverse constructions from plausible data.
Constructional performance correlates with benchmark success.
Even hard-to-distinguish constructions are learned.
Abstract
Construction grammar posits that language learners acquire constructions (form-meaning pairings) from the statistics of their environment. Recent work supports this hypothesis by showing sensitivity to constructions in pretrained language models (PLMs), including one recent study (Rozner et al., 2025) demonstrating that constructions shape RoBERTa's output distribution. However, models under study have generally been trained on developmentally implausible amounts of data, casting doubt on their relevance to human language learning. Here we use Rozner et al.'s methods to evaluate construction learning in masked language models from the 2024 BabyLM Challenge. Our results show that even when trained on developmentally plausible quantities of data, models learn diverse constructions, even hard cases that are superficially indistinguishable. We further find correlational evidence that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques · Machine Learning and Algorithms
