Detection of Consonant Errors in Disordered Speech Based on   Consonant-vowel Segment Embedding

Si-Ioi Ng; Cymie Wing-Yee Ng; Jingyu Li; Tan Lee

arXiv:2106.08536·eess.AS·June 17, 2021

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

Si-Ioi Ng, Cymie Wing-Yee Ng, Jingyu Li, Tan Lee

PDF

Open Access

TL;DR

This paper presents a neural network approach using consonant-vowel segment embeddings to improve detection of consonant errors in disordered speech, addressing limitations of previous methods for difficult consonants.

Contribution

It introduces a CV segment-based neural embedding method that enhances consonant error detection accuracy in speech sound disorder assessment.

Findings

01

Improved detection of difficult consonants using CV segments

02

Neural embeddings effectively capture co-articulation information

03

Enhanced performance over monophone-based methods

Abstract

Speech sound disorder (SSD) refers to a type of developmental disorder in young children who encounter persistent difficulties in producing certain speech sounds at the expected age. Consonant errors are the major indicator of SSD in clinical assessment. Previous studies on automatic assessment of SSD revealed that detection of speech errors concerning short and transitory consonants is less satisfactory. This paper investigates a neural network based approach to detecting consonant errors in disordered speech using consonant-vowel (CV) diphone segment in comparison to using consonant monophone segment. The underlying assumption is that the vowel part of a CV segment carries important information of co-articulation from the consonant. Speech embeddings are extracted from CV segments by a recurrent neural network model. The similarity scores between the embeddings of the test segment and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing

Methods1x1 Convolution · Convolution · Non Maximum Suppression · SSD