DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi, Shinji Watanabe

TL;DR
DiscreTalk introduces a novel end-to-end text-to-speech system that models speech as a machine translation task using discrete symbols, enabling the application of NMT techniques and improving naturalness over traditional TTS models.
Contribution
The paper presents a new TTS approach combining VQ-VAE and Transformer NMT, eliminating the need for hyperparameter tuning and reducing over-smoothing issues.
Findings
Outperforms conventional Transformer-TTS in naturalness
Achieves performance comparable to VQ-VAE reconstruction
Utilizes NMT techniques like beam search and subword units
Abstract
This paper proposes a new end-to-end text-to-speech (E2E-TTS) model based on neural machine translation (NMT). The proposed model consists of two components; a non-autoregressive vector quantized variational autoencoder (VQ-VAE) model and an autoregressive Transformer-NMT model. The VQ-VAE model learns a mapping function from a speech waveform into a sequence of discrete symbols, and then the Transformer-NMT model is trained to estimate this discrete symbol sequence from a given input text. Since the VQ-VAE model can learn such a mapping in a fully-data-driven manner, we do not need to consider hyperparameters of the feature extraction required in the conventional E2E-TTS models. Thanks to the use of discrete symbols, we can use various techniques developed in NMT and automatic speech recognition (ASR) such as beam search, subword units, and fusions with a language model. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsVQ-VAE · Solana Customer Service Number +1-833-534-1729
