Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech
Cong Zhang, Huinan Zeng, Huang Liu, Jiewen Zheng

TL;DR
This paper explores using phonological features from the Featurally Underspecified Lexicon model in multilingual text-to-speech systems to generate native, non-native, and code-switched speech in English and Mandarin, demonstrating feasibility across languages.
Contribution
It introduces a mapping from ARPABET/pinyin to phonological features and shows that these features can be used to synthesize speech in languages not in training data, supporting the FUL model.
Findings
Phonological features enable cross-lingual speech synthesis.
Synthesized speech retains source-language accent.
The approach supports language acquisition simulation.
Abstract
This study investigates whether the phonological features derived from the Featurally Underspecified Lexicon model can be applied in text-to-speech systems to generate native and non-native speech in English and Mandarin. We present a mapping of ARPABET/pinyin to SAMPA/SAMPA-SC and then to phonological features. This mapping was tested for whether it could lead to the successful generation of native, non-native, and code-switched speech in the two languages. We ran two experiments, one with a small dataset and one with a larger dataset. The results supported that phonological features could be used as a feasible input system for languages in or not in the train data, although further investigation is needed to improve model performance. The results lend support to FUL by presenting successfully synthesised output, and by having the output carrying a source-language accent when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems
