Applying Phonological Features in Multilingual Text-To-Speech

Cong Zhang; Huinan Zeng; Huang Liu; Jiewen Zheng

arXiv:2110.03609·cs.CL·October 12, 2021

Applying Phonological Features in Multilingual Text-To-Speech

Cong Zhang, Huinan Zeng, Huang Liu, Jiewen Zheng

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of phonological features in multilingual text-to-speech systems to generate native, non-native, and code-switched speech in English and Mandarin, demonstrating feasibility and insights into second language acquisition.

Contribution

It introduces a mapping from ARPABET/pinyin to phonological features and demonstrates their application in multilingual TTS for native and non-native speech synthesis.

Findings

01

Phonological features can be used as input for multilingual TTS.

02

Generated speech shows accented qualities relevant to second language learning.

03

Feasibility of phonological features in TTS is confirmed, with room for performance improvements.

Abstract

This study investigates whether phonological features can be applied in text-to-speech systems to generate native and non-native speech in English and Mandarin. We present a mapping of ARPABET/pinyin to SAMPA/SAMPA-SC and then to phonological features. We tested whether this mapping could lead to the successful generation of native, non-native, and code-switched speech in the two languages. We ran two experiments, one with a small dataset and one with a larger dataset. The results proved that phonological features could be used as a feasible input system, although further investigation is needed to improve model performance. The accented output generated by the TTS models also helps with understanding human second language acquisition processes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

congzhang365/feature_tts
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research