Speaker-adaptive neural vocoders for parametric speech synthesis systems
Eunwoo Song, Jin-Seob Kim, Kyungguen Byun, Hong-Goo Kang

TL;DR
This paper introduces speaker-adaptive neural vocoders for parametric TTS systems, enabling high-quality speech synthesis with limited target speaker data by combining universal and speaker-specific modeling.
Contribution
It presents a novel speaker adaptation approach that enhances neural vocoder performance in low-data scenarios, outperforming traditional and non-adaptive neural vocoders.
Findings
Achieved high MOS scores with only 10 minutes of training data
Outperformed traditional source-filter and non-adaptive WaveNet vocoders
Effective speaker adaptation improves speech naturalness in low-resource conditions
Abstract
This paper proposes speaker-adaptive neural vocoders for parametric text-to-speech (TTS) systems. Recently proposed WaveNet-based neural vocoding systems successfully generate a time sequence of speech signal with an autoregressive framework. However, it remains a challenge to synthesize high-quality speech when the amount of a target speaker's training data is insufficient. To generate more natural speech signals with the constraint of limited training data, we propose a speaker adaptation task with an effective variation of neural vocoding models. In the proposed method, a speaker-independent training method is applied to capture universal attributes embedded in multiple speakers, and the trained model is then optimized to represent the specific characteristics of the target speaker. Experimental results verify that the proposed TTS systems with speaker-adaptive neural vocoders…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
MethodsMixture of Logistic Distributions · Dilated Causal Convolution · WaveNet
