Data Augmentation for Pathological Speech Enhancement
Mingchi Hou, Enno Hermann, Ina Kodrasi

TL;DR
This study evaluates various data augmentation techniques to enhance speech enhancement models for pathological speech, finding noise augmentation most effective but noting persistent performance gaps.
Contribution
It systematically compares transformative, generative, and noise augmentation strategies for pathological speech enhancement, revealing their relative effectiveness and limitations.
Findings
Noise augmentation yields the largest performance gains.
Transformative augmentation provides moderate improvements.
Generative augmentation can harm performance with more synthetic data.
Abstract
The performance of state-of-the-art speech enhancement (SE) models considerably degrades for pathological speech due to atypical acoustic characteristics and limited data availability. This paper systematically investigates data augmentation (DA) strategies to improve SE performance for pathological speakers, evaluating both predictive and generative SE models. We examine three DA categories, i.e., transformative, generative, and noise augmentation, assessing their impact with objective SE metrics. Experimental results show that noise augmentation consistently delivers the largest and most robust gains, transformative augmentations provide moderate improvements, while generative augmentation yields limited benefits and can harm performance as the amount of synthetic data increases. Furthermore, we show that the effectiveness of DA varies depending on the SE model, with DA being more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Voice and Speech Disorders · Speech Recognition and Synthesis
