Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
Thanathai Lertpetchpun, Yoonjeong Lee, Thanapat Trachu, Jihwan Lee, Tiantian Feng, Dani Byrd, Shrikanth Narayanan

TL;DR
This paper investigates how speaker embeddings interact with phonological rules in accented speech synthesis, proposing a metric to quantify their influence and demonstrating improved accent control through combined rule and embedding approaches.
Contribution
It introduces the phoneme shift rate (PSR) metric and analyzes the entanglement between speaker embeddings and phonological rules in TTS systems.
Findings
Combining rules with embeddings improves accent authenticity.
Embeddings can override phonological rules, affecting accent control.
Rules serve as a controllable lever for accent manipulation.
Abstract
Many spoken languages, including English, exhibit wide variation in dialects and accents, making accent control an important capability for flexible text-to-speech (TTS) models. Current TTS systems typically generate accented speech by conditioning on speaker embeddings associated with specific accents. While effective, this approach offers limited interpretability and controllability, as embeddings also encode traits such as timbre and emotion. In this study, we analyze the interaction between speaker embeddings and linguistically motivated phonological rules in accented speech synthesis. Using American and British English as a case study, we implement rules for flapping, rhoticity, and vowel correspondences. We propose the phoneme shift rate (PSR), a novel metric quantifying how strongly embeddings preserve or override rule-based transformations. Experiments show that combining rules…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Language Development and Disorders
