An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space
Jihwan Lee, Jae-Sung Bae, Seongkyu Mun, Heejin Choi, Joun Yeop Lee,, Hoon-Young Cho, Chanwoo Kim

TL;DR
This study uses vowel space analysis to evaluate L2 accents in cross-lingual TTS systems, revealing differences based on architecture and linguistic features, and highlighting the importance of language-specific handling.
Contribution
It introduces vowel space analysis as an effective tool to assess L2 accents in cross-lingual TTS, providing insights into architectural and linguistic factors affecting accent quality.
Findings
Glow-TTS is less accented than Tacotron.
Non-shared vowels exhibit stronger L2 accents.
L2 accents in TTS resemble those of human L2 learners.
Abstract
With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis, which is often utilized to explore various aspects of language including L2 accents, is a great alternative analysis tool. In this study, we apply the vowel space analysis method to explore L2 accents of cross-lingual TTS systems. Through the vowel space analysis, we observe the three followings: a) a parallel architecture (Glow-TTS) is less L2-accented than an auto-regressive one (Tacotron); b) L2 accents are more dominant in non-shared vowels in a language pair; and c) L2 accents of cross-lingual TTS systems share some phenomena with those of human L2 learners. Our findings imply that it is necessary for TTS systems to handle each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research
