LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

Min-Jae Hwang; Frank Soong; Eunwoo Song; Xi Wang; Hyeonjoo Kang; and; Hong-Goo Kang

arXiv:1811.11913·eess.AS·March 5, 2020·19 cites

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, and, Hong-Goo Kang

PDF

Open Access

TL;DR

LP-WaveNet introduces a linear prediction-based neural vocoder that jointly models vocal source and tract interactions, significantly improving speech synthesis quality over traditional WaveNet vocoders.

Contribution

The paper presents a novel LP-WaveNet vocoder that jointly trains vocal source and tract components within a WaveNet framework, addressing previous mismatches and noise issues.

Findings

01

Outperforms conventional WaveNet vocoders objectively and subjectively.

02

Achieves a 4.47 MOS score in TTS evaluation.

03

Effectively models interactions between vocal source and tract.

Abstract

We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target database contains massive amount of acoustical information such as prosody, style or expressiveness. As a solution, the approaches that only generate the vocal source component by a neural vocoder have been proposed. However, they tend to generate synthetic noise because the vocal source component is independently handled without considering the entire speech production process; where it is inevitable to come up with a mismatch between vocal source and vocal tract filter. To address this problem, we propose an LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing