TL;DR
This paper introduces Quasi-Periodic WaveNet (QPNet), a novel autoregressive waveform model with pitch-dependent dilated convolutions, enhancing pitch control and modeling quasi-periodic signals like speech more effectively.
Contribution
The paper proposes PDCNNs for dynamic architecture adjustment based on pitch and a cascaded structure for better modeling of periodic signals, advancing waveform generation techniques.
Findings
PDCNNs improve pitch controllability for unseen F0 features.
Cascaded structure enhances speech generation quality.
QPNet outperforms vanilla WaveNet in modeling quasi-periodic signals.
Abstract
In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the limited pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs). Specifically, as a probabilistic autoregressive generation model with stacked dilated convolution layers, WN achieves high-fidelity audio waveform generation. However, the pure-data-driven nature and the lack of prior knowledge of audio signals degrade the pitch controllability of WN. For instance, it is difficult for WN to precisely generate the periodic components of audio signals when the given auxiliary fundamental frequency () features are outside the range observed in the training data. To address this problem, QPNet with two novel designs is proposed. First, the PDCNN component is applied to dynamically change the network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDilated Causal Convolution · Mixture of Logistic Distributions · Convolution · WaveNet · Dilated Convolution
