Improving TTS for Shanghainese: Addressing Tone Sandhi via Word   Segmentation

Yuanhao Chen

arXiv:2307.16199·cs.CL·August 1, 2023

Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

Yuanhao Chen

PDF

Open Access 1 Repo

TL;DR

This paper improves Shanghainese TTS by incorporating word segmentation and syllable annotation to better model tone sandhi, addressing limitations of previous Mandarin-based approaches and enhancing speech naturalness.

Contribution

It introduces a novel preprocessing method using word segmentation and syllable annotation to better capture tone sandhi in Shanghainese TTS models.

Findings

01

Word segmentation improves tone sandhi accuracy in TTS.

02

Syllable annotation serves as a proxy for prosodic information.

03

Prosodic annotation can model dynamic tonal phenomena.

Abstract

Tone is a crucial component of the prosody of Shanghainese, a Wu Chinese variety spoken primarily in urban Shanghai. Tone sandhi, which applies to all multi-syllabic words in Shanghainese, then, is key to natural-sounding speech. Unfortunately, recent work on Shanghainese TTS (text-to-speech) such as Apple's VoiceOver has shown poor performance with tone sandhi, especially LD (left-dominant sandhi). Here I show that word segmentation during text preprocessing can improve the quality of tone sandhi production in TTS models. Syllables within the same word are annotated with a special symbol, which serves as a proxy for prosodic information of the domain of LD. Contrary to the common practice of using prosodic annotation mainly for static pauses, this paper demonstrates that prosodic annotation can also be applied to dynamic tonal phenomena. I anticipate this project to be a starting point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

edward-martyr/shanghainese-tts
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Natural Language Processing Techniques