Automatic Annotation of Grammaticality in Child-Caregiver Conversations
Mitja Nikolaus (ILCB, LPL, LIS, TALEP), Abhishek Agrawal (ILCB, LIS,, TALEP), Petros Kaklamanis, Alex Warstadt (SED), Abdellah Fourtassi (ILCB,, LIS, TALEP)

TL;DR
This paper introduces an automatic annotation tool for grammaticality in child-caregiver conversations, enabling large-scale, reproducible studies of language development using NLP models, especially fine-tuned Transformers.
Contribution
It presents a new coding scheme and trained Transformer-based models for automatic grammaticality annotation in child language data, achieving human-level agreement.
Findings
Transformer models outperform other NLP models.
Automatic annotations align with human judgments.
Children's grammaticality increases with age.
Abstract
The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver conversations, tools for automatic annotation can offer an effective alternative to tedious manual annotation. We propose a coding scheme for context-dependent grammaticality in child-caregiver conversations and annotate more than 4,000 utterances from a large corpus of transcribed conversations. Based on these annotations, we train and evaluate a range of NLP models. Our results show that fine-tuned Transformer-based models perform best, achieving human inter-annotation agreement levels.As a first application and sanity check of this tool, we use the trained models to annotate a corpus almost two orders of magnitude larger than the manually annotated data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Digital Communication and Language
