LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis
Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, and, Hong-Goo Kang

TL;DR
LP-WaveNet introduces a linear prediction-based neural vocoder that jointly models vocal source and tract interactions, significantly improving speech synthesis quality over traditional WaveNet vocoders.
Contribution
The paper presents a novel LP-WaveNet vocoder that jointly trains vocal source and tract components within a WaveNet framework, addressing previous mismatches and noise issues.
Findings
Outperforms conventional WaveNet vocoders objectively and subjectively.
Achieves a 4.47 MOS score in TTS evaluation.
Effectively models interactions between vocal source and tract.
Abstract
We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target database contains massive amount of acoustical information such as prosody, style or expressiveness. As a solution, the approaches that only generate the vocal source component by a neural vocoder have been proposed. However, they tend to generate synthetic noise because the vocal source component is independently handled without considering the entire speech production process; where it is inevitable to come up with a mismatch between vocal source and vocal tract filter. To address this problem, we propose an LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
