Morphosyntactic Analysis for CHILDES

Houjun Liu; Brian MacWhinney

arXiv:2407.12389·cs.CL·July 18, 2024·2 cites

Morphosyntactic Analysis for CHILDES

Houjun Liu, Brian MacWhinney

PDF

Open Access

TL;DR

This paper introduces a new morphosyntactic analysis framework for CHILDES data across 27 languages, leveraging AI and ML advancements to enable consistent crosslinguistic comparisons in language development research.

Contribution

It applies the UD framework and Batchalign2 to transcribe and link CHILDES data, creating a standardized resource for crosslinguistic language learning studies.

Findings

01

Successful application of UD framework to 27 languages

02

Enhanced comparability of language development data

03

New resources for crosslinguistic research

Abstract

Language development researchers are interested in comparing the process of language learning across languages. Unfortunately, it has been difficult to construct a consistent quantitative framework for such comparisons. However, recent advances in AI (Artificial Intelligence) and ML (Machine Learning) are providing new methods for ASR (automatic speech recognition) and NLP (natural language processing) that can be brought to bear on this problem. Using the Batchalign2 program (Liu et al., 2023), we have been transcribing and linking data for the CHILDES database and have applied the UD (Universal Dependencies) framework to provide a consistent and comparable morphosyntactic analysis for 27 languages. These new resources open possibilities for deeper crosslinguistic study of language learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Language and cultural evolution · Text Readability and Simplification