Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Ahad Jawaid, Shreeram Suresh Chandra, Junchen Lu, Berrak Sisman

TL;DR
This paper presents StyleMoE, a novel mixture of experts approach for expressive text-to-speech synthesis that improves style transfer capabilities, especially for diverse and unseen reference speech, by learning specialized style representations.
Contribution
It introduces StyleMoE, the first application of mixture of experts in TTS, to enhance style encoding and transfer for more expressive speech synthesis.
Findings
Improved style transfer for diverse and unseen speech references.
Enhanced style coverage in TTS models.
Outperforms existing state-of-the-art style transfer methods.
Abstract
Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. However, encoding stylistic information (e.g., timbre, emotion, and prosody) from diverse and unseen reference speech remains a challenge. This paper introduces StyleMoE, an approach that addresses the issue of learning averaged style representations in the style encoder by creating style experts that learn from subsets of data. The proposed method replaces the style encoder in a TTS framework with a Mixture of Experts (MoE) layer. The style experts specialize by learning from subsets of reference speech routed to them by the gating network, enabling them to handle different aspects of the style space. As a result, StyleMoE improves the style coverage of the style encoder for style transfer TTS. Our experiments, both objective and subjective, demonstrate improved style transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
MethodsMixture of Experts
