MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
Heyang Xue, Xuchen Song, Yu Tang, Jianyu Chen, Yanru Chen, Yang Li, Yahui Zhou

TL;DR
MoE-TTS introduces a mixture-of-experts approach to improve out-of-domain text understanding in description-based TTS, leveraging pre-trained language models with specialized speech modality experts for more accurate speech synthesis.
Contribution
The paper proposes MoE-TTS, a novel mixture-of-experts model that enhances out-of-domain text understanding in TTS by augmenting pre-trained LLMs with speech-specific experts while keeping the core LLM frozen.
Findings
MoE-TTS outperforms existing models on out-of-domain description tests.
Commercial TTS systems struggle with carefully designed out-of-domain inputs.
MoE-TTS generates speech that more accurately reflects diverse descriptions.
Abstract
Description-based text-to-speech (TTS) models exhibit strong performance on in-domain text descriptions, i.e., those encountered during training. However, in real-world applications, the diverse range of user-generated descriptions inevitably introduces numerous out-of-domain inputs that challenge the text understanding capabilities of these systems. To address this issue, we propose MoE-TTS, a description-based TTS model designed to enhance the understanding of out-of-domain text descriptions. MoE-TTS employs a modality-based mixture-of-experts (MoE) approach to augment a pre-trained textual large language model (LLM) with a set of specialized weights adapted to the speech modality while maintaining the original LLM frozen during training. This approach allows MoE-TTS to effectively leverage the pre-trained knowledge and text understanding abilities of textual LLMs. Our experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
