Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng, Zhenhua Ling

TL;DR
This paper introduces a speaker adaptation method for statistical parametric speech synthesis that incorporates intuitive prosodic features like pitch and energy, improving the naturalness and speaker similarity of synthesized speech.
Contribution
It proposes a novel integration of intuitive prosodic features into existing speaker adaptation frameworks for sequence-to-sequence models like Tacotron2.
Findings
Enhanced objective and subjective performance over baseline methods.
Utterance-level prosodic features yield the highest speech similarity.
Improved naturalness and speaker similarity in synthesized speech.
Abstract
In this paper, we propose a method of speaker adaption with intuitive prosodic features for statistical parametric speech synthesis. The intuitive prosodic features employed in this method include pitch, pitch range, speech rate and energy considering that they are directly related with the overall prosodic characteristics of different speakers. The intuitive prosodic features are extracted at utterance-level or speaker-level, and are further integrated into the existing speaker-encoding-based and speaker-embedding-based adaptation frameworks respectively. The acoustic models are sequence-to-sequence ones based on Tacotron2. Intuitive prosodic features are concatenated with text encoder outputs and speaker vectors for decoding acoustic features.Experimental results have demonstrated that our proposed methods can achieve better objective and subjective performance than the baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
