The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for   Improved Dysarthric Speech Recognition

Luke Prananta; Bence Mark Halpern; Siyuan Feng; Odette Scharenborg

arXiv:2201.04908·cs.SD·January 14, 2022

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg

PDF

Open Access

TL;DR

This study evaluates various methods, including GAN-based voice conversion and simple signal processing, for enhancing dysarthric speech to improve recognition accuracy, finding that straightforward techniques can be as effective as complex models.

Contribution

The paper introduces a comparative analysis of GAN-based and signal processing methods, proposing a combined approach that enhances dysarthric speech recognition.

Findings

01

Simple signal processing methods achieve recognition results comparable to GAN-based methods.

02

Combining MaskCycleGAN-VC with time stretching improves recognition for some dysarthric speakers.

03

Straightforward techniques can be effective alternatives to complex GAN models.

Abstract

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretched enhancement is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders