Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of   Speech Sound Disorders in Korean children

Taekyung Ahn; Yeonjung Hong; Younggon Im; Do Hyung Kim; Dayoung Kang,; Joo Won Jeong; Jae Won Kim; Min Jung Kim; Ah-ra Cho; Dae-Hyun Jang; Hosung; Nam

arXiv:2403.08187·cs.CL·March 14, 2024·2 cites

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang,, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung, Nam

PDF

Open Access

TL;DR

This study fine-tuned a wav2vec 2.0 model to accurately recognize children's speech pronunciations for diagnosing speech sound disorders in Korean, aiming to replace manual transcription in clinical diagnosis.

Contribution

It introduces a specialized ASR model trained on children's speech to improve pronunciation diagnosis in speech sound disorder assessments.

Findings

01

Achieved about 90% accuracy in pronunciation recognition

02

Demonstrated feasibility of using ASR for clinical speech diagnosis

03

Identified need for further improvement in recognizing unclear pronunciations

Abstract

This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Phonetics and Phonology Research

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · Self-Supervised Deep Supervision