Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech
Marek Strong, Jonas Rohnke, Antonio Bonafonte, Mateusz {\L}ajszczak,, Trevor Wood

TL;DR
This paper introduces SVQ-VAE, a novel neural TTS architecture with a split vector quantizer that improves naturalness and predictability of the acoustic space, enabling more efficient text-to-speech synthesis.
Contribution
The paper proposes SVQ-VAE, a new architecture that enhances VAE and VQ-VAE for neural TTS by using a split vector quantizer for better representation and efficiency.
Findings
SVQ-VAE outperforms VAE and VQ-VAE in naturalness.
The latent acoustic space is 32% more predictable from text.
Efficient prediction from text with a small discretized latent space.
Abstract
We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) architectures. Compared to these previous architectures, our proposed model retains the benefits of using an utterance-level bottleneck, while keeping significant representation power and a discretized latent space small enough for efficient prediction from text. We train the model on recordings in the expressive task-oriented dialogues domain and show that SVQ-VAE achieves a statistically significant improvement in naturalness over the VAE and VQ-VAE models. Furthermore, we demonstrate that the SVQ-VAE latent acoustic space is predictable from text, reducing the gap between the standard constant vector synthesis and vocoded recordings by 32%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsVQ-VAE
