Online Speaker Adaptation for WaveNet-based Neural Vocoders

Qiuchen Huang; Yang Ai; Zhenhua Ling

arXiv:2008.06182·eess.AS·August 17, 2020·APSIPA

Online Speaker Adaptation for WaveNet-based Neural Vocoders

Qiuchen Huang, Yang Ai, Zhenhua Ling

PDF

Open Access

TL;DR

This paper introduces an online speaker adaptation method for WaveNet vocoders that uses a speaker encoder to improve waveform reconstruction for unseen speakers, enhancing both objective and subjective quality.

Contribution

The paper presents a novel online speaker adaptation approach using a speaker encoder and a speaker-aware WaveNet vocoder for improved speaker-independent speech synthesis.

Findings

01

Better waveform reconstruction for unseen speakers

02

Improved objective and subjective performance

03

Effective adaptation with a speaker encoder

Abstract

In this paper, we propose an online speaker adaptation method for WaveNet-based neural vocoders in order to improve their performance on speaker-independent waveform generation. In this method, a speaker encoder is first constructed using a large speaker-verification dataset which can extract a speaker embedding vector from an utterance pronounced by an arbitrary speaker. At the training stage, a speaker-aware WaveNet vocoder is then built using a multi-speaker dataset which adopts both acoustic feature sequences and speaker embedding vectors as conditions.At the generation stage, we first feed the acoustic feature sequence from a test speaker into the speaker encoder to obtain the speaker embedding vector of the utterance. Then, both the speaker embedding vector and acoustic features pass the speaker-aware WaveNet vocoder to reconstruct speech waveforms. Experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing