TL;DR
StyleMelGAN is a lightweight, high-fidelity neural vocoder that uses temporal adaptive normalization and efficient training techniques to produce natural speech faster than real-time, outperforming prior models in quality.
Contribution
It introduces StyleMelGAN, a novel neural vocoder combining temporal adaptive normalization with multi-scale spectral loss for high-quality, efficient speech synthesis.
Findings
Outperforms prior neural vocoders in perceptual quality.
Achieves several times faster-than-real-time synthesis on CPUs and GPUs.
Demonstrates superior performance in copy-synthesis and TTS scenarios.
Abstract
In recent years, neural vocoders have surpassed classical speech generation approaches in naturalness and perceptual quality of the synthesized speech. Computationally heavy models like WaveNet and WaveGlow achieve best results, while lightweight GAN models, e.g. MelGAN and Parallel WaveGAN, remain inferior in terms of perceptual quality. We therefore propose StyleMelGAN, a lightweight neural vocoder allowing synthesis of high-fidelity speech with low computational complexity. StyleMelGAN employs temporal adaptive normalization to style a low-dimensional noise vector with the acoustic features of the target speech. For efficient training, multiple random-window discriminators adversarially evaluate the speech signal analyzed by a filter bank, with regularization provided by a multi-scale spectral reconstruction loss. The highly parallelizable speech generation is several times faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDilated Convolution · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Invertible 1x1 Convolution · MelGAN Residual Block · 1x1 Convolution · Affine Coupling · Grouped Convolution · Dense Connections · Normalizing Flows
