Towards Robust Neural Vocoding for Speech Generation: A Survey
Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-yi Lee

TL;DR
This survey examines the robustness of neural vocoders in speech synthesis, highlighting how speaker diversity impacts performance more than language variation, and compares different models across various tasks.
Contribution
It provides a comprehensive evaluation of four neural vocoders' robustness across diverse datasets and tasks, offering insights into their suitability for TTS and voice conversion.
Findings
Speaker variety significantly affects vocoder robustness.
WaveNet and WaveRNN excel in TTS applications.
Parallel WaveGAN performs better in voice conversion.
Abstract
Recently, neural vocoders have been widely used in speech synthesis tasks, including text-to-speech and voice conversion. However, when encountering data distribution mismatch between training and inference, neural vocoders trained on real data often degrade in voice quality for unseen scenarios. In this paper, we train four common neural vocoders, including WaveNet, WaveRNN, FFTNet, Parallel WaveGAN alternately on five different datasets. To study the robustness of neural vocoders, we evaluate the models using acoustic features from seen/unseen speakers, seen/unseen languages, a text-to-speech model, and a voice conversion model. We found out that the speaker variety is much more important for achieving a universal vocoder than the language. Through our experiments, we show that WaveNet and WaveRNN are more suitable for text-to-speech models, while Parallel WaveGAN is more suitable for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Sigmoid Activation · WaveRNN · Mixture of Logistic Distributions · *Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Dense Connections · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Dropout
