TL;DR
This paper introduces QPPWG, a non-autoregressive waveform generator that incorporates pitch-dependent structures to enhance pitch control and speech quality, outperforming previous models in both objective and subjective evaluations.
Contribution
The paper presents a novel quasi-periodic structure with pitch-dependent dilated convolutions in WaveGAN, improving pitch controllability and interpretability in raw waveform speech synthesis.
Findings
QPPWG outperforms PWG in scaled pitch scenarios.
QPPWG demonstrates better spectral and excitation modeling.
Enhanced interpretability of intermediate network outputs.
Abstract
In this paper, we propose a quasi-periodic parallel WaveGAN (QPPWG) waveform generative model, which applies a quasi-periodic (QP) structure to a parallel WaveGAN (PWG) model using pitch-dependent dilated convolution networks (PDCNNs). PWG is a small-footprint GAN-based raw waveform generative model, whose generation time is much faster than real time because of its compact model and non-autoregressive (non-AR) and non-causal mechanisms. Although PWG achieves high-fidelity speech generation, the generic and simple network architecture lacks pitch controllability for an unseen auxiliary fundamental frequency () feature such as a scaled . To improve the pitch controllability and speech modeling capability, we apply a QP structure with PDCNNs to PWG, which introduces pitch information to the network by dynamically changing the network architecture corresponding to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Dense Connections · Tanh Activation · WGAN-GP Loss · Dilated Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Phase Shuffle · Dropout · *Communicated@Fast*How Do I Communicate to Expedia? · WaveGAN
