Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou, Alexandra Vioni, Nikolaos Ellinas, Georgios, Vamvoukakis, Konstantinos Markopoulos, Panos Kakoulidis, June Sig Sung,, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis

TL;DR
This paper introduces an improved multispeaker prosody control method using prosodic clustering, data augmentation, and normalization, enabling fine-grained phoneme-level control while preserving speaker identity and quality.
Contribution
It proposes novel enhancements for multispeaker prosody control, including speaker-independent clustering and data augmentation, improving control range and generalization to unseen speakers.
Findings
Enhanced prosody control range and coverage.
Maintains high speech quality across speakers.
Effective control for unseen speakers with limited data.
Abstract
This paper presents a method for phoneme-level prosody control of F0 and duration on a multispeaker text-to-speech setup, which is based on prosodic clustering. An autoregressive attention-based model is used, incorporating multispeaker architecture modules in parallel to a prosody encoder. Several improvements over the basic single-speaker method are proposed that increase the prosodic control range and coverage. More specifically we employ data augmentation, F0 normalization, balanced clustering for duration, and speaker-independent prosodic clustering. These modifications enable fine-grained phoneme-level prosody control for all speakers contained in the training set, while maintaining the speaker identity. The model is also fine-tuned to unseen speakers with limited amounts of data and it is shown to maintain its prosody control capabilities, verifying that the speaker-independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
