Text Enhancement for Paragraph Processing in End-to-End Code-switching   TTS

Chunyu Qiang; Jianhua Tao; Ruibo Fu; Zhengqi Wen; Jiangyan Yi; Tao; Wang; Shiming Wang

arXiv:2210.11429·cs.SD·October 21, 2022

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Chunyu Qiang, Jianhua Tao, Ruibo Fu, Zhengqi Wen, Jiangyan Yi, Tao, Wang, Shiming Wang

PDF

Open Access

TL;DR

This paper introduces text enhancement and cross-lingual embedding techniques to improve naturalness, consistency, and prosody stability in end-to-end code-switching TTS systems across multiple language pairs.

Contribution

It proposes novel methods for text enhancement and cross-lingual embeddings that significantly improve code-switching TTS quality and can be extended to various languages.

Findings

01

Enhanced naturalness and consistency in code-switching speech

02

Improved prosody stability in paragraph synthesis

03

Effective across multiple language pairs including Mandarin-English, Shanghaiese, and Cantonese

Abstract

Current end-to-end code-switching Text-to-Speech (TTS) can already generate high quality two languages speech in the same utterance with single speaker bilingual corpora. When the speakers of the bilingual corpora are different, the naturalness and consistency of the code-switching TTS will be poor. The cross-lingual embedding layers structure we proposed makes similar syllables in different languages relevant, thus improving the naturalness and consistency of generated speech. In the end-to-end code-switching TTS, there exists problem of prosody instability when synthesizing paragraph text. The text enhancement method we proposed makes the input contain prosodic information and sentence-level context information, thus improving the prosody stability of paragraph text. Experimental results demonstrate the effectiveness of the proposed methods in the naturalness, consistency, and prosody…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems