Improving Mandarin Prosodic Structure Prediction with Multi-level   Contextual Information

Jie Chen; Changhe Song; Deyi Tuo; Xixin Wu; Shiyin Kang; Zhiyong Wu,; Helen Meng

arXiv:2308.16577·cs.SD·September 1, 2023

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu,, Helen Meng

PDF

TL;DR

This paper introduces a hierarchical encoder that leverages multi-level inter- and intra-utterance linguistic context to enhance prosodic structure prediction in Mandarin TTS, resulting in more natural speech synthesis.

Contribution

It proposes a novel multi-level contextual encoding approach combined with multi-task learning for improved prosodic boundary prediction in Mandarin TTS.

Findings

01

Higher F1 scores in prosodic boundary prediction

02

Improved naturalness of synthesized speech

03

Effective use of inter-utterance information

Abstract

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use inter-utterance linguistic information to improve the performance of PSP. Multi-level contextual information, which includes both inter-utterance and intrautterance linguistic information, is extracted by a hierarchical encoder from character level, utterance level and discourse level of the input text. Then a multi-task learning (MTL) decoder predicts prosodic boundaries from multi-level contextual information. Objective evaluation results on two datasets show that our method achieves better F1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus