CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions
Francesca Padovani, Xiulin Yang, Bastian Bunzeck, Jaap Jumelet, Yevgen Matusevych, Nathan Schneider, Arianna Bisazza

TL;DR
This paper introduces CAIT, an open-source toolkit with a dependency parser, POS tagger, and construction tagger, tailored for analyzing syntactic structures in child-adult interaction data from CHILDES, enhancing research in language acquisition.
Contribution
The paper presents a new dependency parser and accompanying tools specifically designed for CHILDES data, outperforming existing parsers and facilitating large-scale language acquisition studies.
Findings
The parser outperforms SpaCy and Stanza on CHILDES data.
The toolkit enables detailed analysis of syntactic development over time.
Error analysis highlights areas for future improvement.
Abstract
CHILDES is a paramount resource for language acquisition studies -- yet computational tools for analyzing its syntactic structure remain limited. Leveraging the recent release of the UD-English-CHILDES treebank with gold-standard Universal Dependencies (UD) annotations, we train a state-of-the-art dependency parser specifically tailored to CHILDES. The parser more accurately captures syntactic patterns in child--adult interactions, outperforming widely used off-the-shelf English parsers, including SpaCy and Stanza. Alongside the parser, we also release a Part-of-Speech tagger and an utterance-level construction tagger, which together form the open-source Syntactic Parsing Toolkit for Child--Adult InTeractions (CAIT). Through a detailed error analysis and a case study tracking the distribution of syntactic constructions across developmental time in CHILDES, we demonstrate the practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
