Online Speaker Adaptation for WaveNet-based Neural Vocoders
Qiuchen Huang, Yang Ai, Zhenhua Ling

TL;DR
This paper introduces an online speaker adaptation method for WaveNet vocoders that uses a speaker encoder to improve waveform reconstruction for unseen speakers, enhancing both objective and subjective quality.
Contribution
The paper presents a novel online speaker adaptation approach using a speaker encoder and a speaker-aware WaveNet vocoder for improved speaker-independent speech synthesis.
Findings
Better waveform reconstruction for unseen speakers
Improved objective and subjective performance
Effective adaptation with a speaker encoder
Abstract
In this paper, we propose an online speaker adaptation method for WaveNet-based neural vocoders in order to improve their performance on speaker-independent waveform generation. In this method, a speaker encoder is first constructed using a large speaker-verification dataset which can extract a speaker embedding vector from an utterance pronounced by an arbitrary speaker. At the training stage, a speaker-aware WaveNet vocoder is then built using a multi-speaker dataset which adopts both acoustic feature sequences and speaker embedding vectors as conditions.At the generation stage, we first feed the acoustic feature sequence from a test speaker into the speaker encoder to obtain the speaker embedding vector of the utterance. Then, both the speaker embedding vector and acoustic features pass the speaker-aware WaveNet vocoder to reconstruct speech waveforms. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
