A Deep Learning Approach to Data-driven Parameterizations for Statistical Parametric Speech Synthesis
Prasanna Kumar Muthukumar, Alan W. Black

TL;DR
This paper introduces a data-driven, deep learning-based parameterization of the Mel Log Spectrum tailored for statistical parametric speech synthesis, aiming to improve synthesis quality over traditional Mel Cepstral coefficients.
Contribution
It proposes a novel invertible, low-dimensional encoding using a tapered Stacked Denoising Autoencoder combined with a fine-tuned MLP for better spectrum parameterization in synthesis.
Findings
Improved speech synthesis quality with the new parameterization
Robustness to noise in the encoding process
Better fulfillment of synthesis requirements compared to traditional methods
Abstract
Nearly all Statistical Parametric Speech Synthesizers today use Mel Cepstral coefficients as the vocal tract parameterization of the speech signal. Mel Cepstral coefficients were never intended to work in a parametric speech synthesis framework, but as yet, there has been little success in creating a better parameterization that is more suited to synthesis. In this paper, we use deep learning algorithms to investigate a data-driven parameterization technique that is designed for the specific requirements of synthesis. We create an invertible, low-dimensional, noise-robust encoding of the Mel Log Spectrum by training a tapered Stacked Denoising Autoencoder (SDA). This SDA is then unwrapped and used as the initialization for a Multi-Layer Perceptron (MLP). The MLP is fine-tuned by training it to reconstruct the input at the output layer. This MLP is then split down the middle to form…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsDenoising Autoencoder · Solana Customer Service Number +1-833-534-1729
