MSR-NV: Neural Vocoder Using Multiple Sampling Rates
Kentaro Mitsui, Kei Sawada

TL;DR
This paper introduces MSR-NV, a neural vocoder capable of generating high-quality speech at multiple sampling rates within a single model, eliminating the need for re-training for different rates.
Contribution
The study presents a novel multi-sampling-rate neural vocoder that extends Parallel WaveGAN, enabling efficient high-quality speech synthesis across various sampling rates in one model.
Findings
Achieves higher subjective speech quality than separate models at 16, 24, and 48 kHz.
Maintains inference speed despite supporting multiple sampling rates.
Leverages lower sampling rate speech to enhance synthetic speech quality.
Abstract
The development of neural vocoders (NVs) has resulted in the high-quality and fast generation of waveforms. However, conventional NVs target a single sampling rate and require re-training when applied to different sampling rates. A suitable sampling rate varies from application to application due to the trade-off between speech quality and generation speed. In this study, we propose a method to handle multiple sampling rates in a single NV, called the MSR-NV. By generating waveforms step-by-step starting from a low sampling rate, MSR-NV can efficiently learn the characteristics of each frequency band and synthesize high-quality speech at multiple sampling rates. It can be regarded as an extension of the previously proposed NVs, and in this study, we extend the structure of Parallel WaveGAN (PWG). Experimental evaluation results demonstrate that the proposed method achieves remarkably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Speech and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Phase Shuffle · WGAN-GP Loss · HuMan(Expedia)||How do I get a human at Expedia? · Dropout · Convolution · Dense Connections · Tanh Activation · WaveGAN
