Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations
Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee

TL;DR
This study introduces a holistic, speaker verification-based method using posterior features from deep neural networks to automatically detect speech sound disorder in children's speech, achieving high accuracy with utterance-level analysis.
Contribution
It proposes a novel subject-level detection approach that bypasses phoneme-level errors, utilizing deep neural network posterior features and i-vectors for improved SSD detection.
Findings
Achieved 78.2% unweighted average recall in SSD detection.
Outperformed previous phoneme-level fusion methods.
Demonstrated effectiveness on Cantonese-speaking children.
Abstract
This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech. Typically, SSD is manifested by persistent articulation and phonological errors on specific phonemes in the language. The disorder can be detected by focally analyzing the phonemes or the words elicited by the child subject. In the present study, instead of attempting to detect individual phone- and word-level errors, we propose to extract a subject-level representation from a long utterance that is constructed by concatenating multiple test words. The speaker verification approach, and posterior features generated by deep neural network models, are applied to derive various types of holistic representations. A linear classifier is trained to differentiate disordered speech in normal one. On the task of detecting SSD in Cantonese-speaking children, experimental results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research
MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD
