Automatic Detection of Speech Sound Disorder in Child Speech Using   Posterior-based Speaker Representations

Si-Ioi Ng; Cymie Wing-Yee Ng; Jiarui Wang; Tan Lee

arXiv:2203.15405·eess.AS·June 30, 2022

Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations

Si-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee

PDF

Open Access

TL;DR

This study introduces a holistic, speaker verification-based method using posterior features from deep neural networks to automatically detect speech sound disorder in children's speech, achieving high accuracy with utterance-level analysis.

Contribution

It proposes a novel subject-level detection approach that bypasses phoneme-level errors, utilizing deep neural network posterior features and i-vectors for improved SSD detection.

Findings

01

Achieved 78.2% unweighted average recall in SSD detection.

02

Outperformed previous phoneme-level fusion methods.

03

Demonstrated effectiveness on Cantonese-speaking children.

Abstract

This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech. Typically, SSD is manifested by persistent articulation and phonological errors on specific phonemes in the language. The disorder can be detected by focally analyzing the phonemes or the words elicited by the child subject. In the present study, instead of attempting to detect individual phone- and word-level errors, we propose to extract a subject-level representation from a long utterance that is constructed by concatenating multiple test words. The speaker verification approach, and posterior features generated by deep neural network models, are applied to derive various types of holistic representations. A linear classifier is trained to differentiate disordered speech in normal one. On the task of detecting SSD in Cantonese-speaking children, experimental results show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research

MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD