Towards Universal Neural Vocoding with a Multi-band Excited WaveNet
Axel Roebel, Frederik Bous

TL;DR
This paper presents the Multi-Band Excited WaveNet, a neural vocoder designed to generate high-quality speech and singing voices from diverse mel spectrograms, demonstrating broad domain adaptability with fewer data and parameters.
Contribution
It introduces a multi-band excitation neural vocoder with differentiable components, enabling universal voice synthesis across speakers, languages, and styles with improved efficiency.
Findings
Supports a wide range of voices, languages, and expressivity.
Achieves perceptual quality comparable to state-of-the-art vocoders.
Uses fewer training data and parameters than existing models.
Abstract
This paper introduces the Multi-Band Excited WaveNet a neural vocoder for speaking and singing voices. It aims to advance the state of the art towards an universal neural vocoder, which is a model that can generate voice signals from arbitrary mel spectrograms extracted from voice signals. Following the success of the DDSP model and following the development of the recently proposed excitation vocoders we propose a vocoder structure consisting of multiple specialized DNN that are combined with dedicated signal processing components. All components are implemented as differentiable operators and therefore allow joined optimization of the model parameters. To prove the capacity of the model to reproduce high quality voice signals we evaluate the model on single and multi speaker/singer datasets. We conduct a subjective evaluation demonstrating that the models support a wide range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsDilated Causal Convolution · Mixture of Logistic Distributions · WaveNet · Differentiable Digital Signal Processing
