Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?
Arzu Burcu G\"uven, Anna Rogers, Rob van der Goot

TL;DR
This paper investigates whether syntactic categories can improve curriculum learning for language models by analyzing child language data and testing different curriculum strategies, finding syntactic data subsets enhance performance.
Contribution
It introduces developmentally motivated curricula based on syntactic categories and demonstrates their effectiveness over full noisy corpora in language model training.
Findings
Syntactic knowledge aids in interpreting model performance.
Curricula based on syntactic categories improve reading task results.
Using syntactically categorizable data outperforms full noisy datasets.
Abstract
We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLanguage Development and Disorders · Second Language Acquisition and Learning · Text Readability and Simplification
