MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
Ya-Jie Zhang, Wei Song, Yanghao Yue, Zhengchen Zhang, Youzheng Wu,, Xiaodong He

TL;DR
MaskedSpeech is a novel context-aware speech synthesis system that leverages masking strategies to incorporate semantic and acoustic context, significantly enhancing naturalness and expressiveness in paragraph-level speech generation.
Contribution
This paper introduces MaskedSpeech, a new speech synthesis approach that integrates contextual features using masking strategies, improving paragraph-level speech quality.
Findings
Outperforms baseline in naturalness and expressiveness
Effectively incorporates contextual semantic and acoustic features
Enhances paragraph-level speech synthesis quality
Abstract
Humans often speak in a continuous manner which leads to coherent and consistent prosody properties across neighboring utterances. However, most state-of-the-art speech synthesis systems only consider the information within each sentence and ignore the contextual semantic and acoustic features. This makes it inadequate to generate high-quality paragraph-level speech which requires high expressiveness and naturalness. To synthesize natural and expressive speech for a paragraph, a context-aware speech synthesis system named MaskedSpeech is proposed in this paper, which considers both contextual semantic and acoustic features. Inspired by the masking strategy in the speech editing research, the acoustic features of the current sentence are masked out and concatenated with those of contextual speech, and further used as additional model input. The phoneme encoder takes the concatenated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems
