The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement
Anastasia Kuznetsova, Aswin Sivaraman, Minje Kim

TL;DR
This paper explores using neural speech synthesis for data augmentation to improve personalized speech enhancement, demonstrating that high-quality synthetic data can enhance small models' performance while reducing complexity.
Contribution
It introduces a novel data augmentation approach using neural speech synthesis to improve personalized speech enhancement systems with reduced complexity.
Findings
Synthetic data quality impacts PSE performance
Augmented PSE outperforms speaker-agnostic baseline
Significant complexity reduction achieved
Abstract
With the advances in deep learning, speech enhancement systems benefited from large neural network architectures and achieved state-of-the-art quality. However, speaker-agnostic methods are not always desirable, both in terms of quality and their complexity, when they are to be used in a resource-constrained environment. One promising way is personalized speech enhancement (PSE), which is a smaller and easier speech enhancement problem for small models to solve, because it focuses on a particular test-time user. To achieve the personalization goal, while dealing with the typical lack of personal data, we investigate the effect of data augmentation based on neural speech synthesis (NSS). In the proposed method, we show that the quality of the NSS system's synthetic data matters, and if they are good enough the augmented dataset can be used to improve the PSE system that outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research
