Applying Phonological Features in Multilingual Text-To-Speech
Cong Zhang, Huinan Zeng, Huang Liu, Jiewen Zheng

TL;DR
This paper explores the use of phonological features in multilingual text-to-speech systems to generate native, non-native, and code-switched speech in English and Mandarin, demonstrating feasibility and insights into second language acquisition.
Contribution
It introduces a mapping from ARPABET/pinyin to phonological features and demonstrates their application in multilingual TTS for native and non-native speech synthesis.
Findings
Phonological features can be used as input for multilingual TTS.
Generated speech shows accented qualities relevant to second language learning.
Feasibility of phonological features in TTS is confirmed, with room for performance improvements.
Abstract
This study investigates whether phonological features can be applied in text-to-speech systems to generate native and non-native speech in English and Mandarin. We present a mapping of ARPABET/pinyin to SAMPA/SAMPA-SC and then to phonological features. We tested whether this mapping could lead to the successful generation of native, non-native, and code-switched speech in the two languages. We ran two experiments, one with a small dataset and one with a larger dataset. The results proved that phonological features could be used as a feasible input system, although further investigation is needed to improve model performance. The accented output generated by the TTS models also helps with understanding human second language acquisition processes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research
