The Potential of Neural Speech Synthesis-based Data Augmentation for   Personalized Speech Enhancement

Anastasia Kuznetsova; Aswin Sivaraman; Minje Kim

arXiv:2211.07493·eess.AS·November 15, 2022

The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement

Anastasia Kuznetsova, Aswin Sivaraman, Minje Kim

PDF

Open Access

TL;DR

This paper explores using neural speech synthesis for data augmentation to improve personalized speech enhancement, demonstrating that high-quality synthetic data can enhance small models' performance while reducing complexity.

Contribution

It introduces a novel data augmentation approach using neural speech synthesis to improve personalized speech enhancement systems with reduced complexity.

Findings

01

Synthetic data quality impacts PSE performance

02

Augmented PSE outperforms speaker-agnostic baseline

03

Significant complexity reduction achieved

Abstract

With the advances in deep learning, speech enhancement systems benefited from large neural network architectures and achieved state-of-the-art quality. However, speaker-agnostic methods are not always desirable, both in terms of quality and their complexity, when they are to be used in a resource-constrained environment. One promising way is personalized speech enhancement (PSE), which is a smaller and easier speech enhancement problem for small models to solve, because it focuses on a particular test-time user. To achieve the personalization goal, while dealing with the typical lack of personal data, we investigate the effect of data augmentation based on neural speech synthesis (NSS). In the proposed method, we show that the quality of the NSS system's synthetic data matters, and if they are good enough the augmented dataset can be used to improve the PSE system that outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research