Accurate synthesis of Dysarthric Speech for ASR data augmentation
Mohammad Soleymanpour, Michael T. Johnson, Rahim Soleymanpour, Jeffrey, Berry

TL;DR
This paper introduces a neural speech synthesis method tailored for dysarthric speech, enhancing ASR training data and improving recognition accuracy by incorporating severity levels and pause modeling.
Contribution
A novel dysarthric speech synthesis approach using a modified neural TTS with severity and pause controls, aiding data augmentation for better dysarthric ASR performance.
Findings
Synthetic speech improves WER by 12.2% in ASR tasks.
Adding severity and pause controls reduces WER by 6.5%.
Synthesized speech perceived as similar to real dysarthric speech.
Abstract
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talkers. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. Differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels are important components for dysarthric speech modeling, synthesis, and augmentation. For dysarthric speech synthesis, a modified neural multi-talker TTS is implemented by adding a dysarthria severity level coefficient and a pause insertion model to synthesize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research
