Nonwords Pronunciation Classification in Language Development Tests for Preschool Children
Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet

TL;DR
This study evaluates various speech feature extraction methods for classifying nonword pronunciation accuracy in preschool children, aiming to support early language development assessment.
Contribution
It compares multiple phonetic and non-phonetic feature extraction approaches and demonstrates that phonetic modeling significantly improves classification accuracy.
Findings
ASR-based phonetic features outperform non-phonetic features
Best system achieved 89.4% accuracy and 0.923 AUC
Granular phonetic modeling enhances recognition rates
Abstract
This work aims to automatically evaluate whether the language development of children is age-appropriate. Validated speech and language tests are used for this purpose to test the auditory memory. In this work, the task is to determine whether spoken nonwords have been uttered correctly. We compare different approaches that are motivated to model specific language structures: Low-level features (FFT), speaker embeddings (ECAPA-TDNN), grapheme-motivated embeddings (wav2vec 2.0), and phonetic embeddings in form of senones (ASR acoustic model). Each of the approaches provides input for VGG-like 5-layer CNN classifiers. We also examine the adaptation per nonword. The evaluation of the proposed systems was performed using recordings from different kindergartens of spoken nonwords. ECAPA-TDNN and low-level FFT features do not explicitly model phonetic information; wav2vec2.0 is trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Language Development and Disorders
MethodsTest
