Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis

Thanathai Lertpetchpun; Yoonjeong Lee; Thanapat Trachu; Jihwan Lee; Tiantian Feng; Dani Byrd; Shrikanth Narayanan

arXiv:2601.14417·cs.CL·January 29, 2026

Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis

Thanathai Lertpetchpun, Yoonjeong Lee, Thanapat Trachu, Jihwan Lee, Tiantian Feng, Dani Byrd, Shrikanth Narayanan

PDF

Open Access

TL;DR

This paper investigates how speaker embeddings interact with phonological rules in accented speech synthesis, proposing a metric to quantify their influence and demonstrating improved accent control through combined rule and embedding approaches.

Contribution

It introduces the phoneme shift rate (PSR) metric and analyzes the entanglement between speaker embeddings and phonological rules in TTS systems.

Findings

01

Combining rules with embeddings improves accent authenticity.

02

Embeddings can override phonological rules, affecting accent control.

03

Rules serve as a controllable lever for accent manipulation.

Abstract

Many spoken languages, including English, exhibit wide variation in dialects and accents, making accent control an important capability for flexible text-to-speech (TTS) models. Current TTS systems typically generate accented speech by conditioning on speaker embeddings associated with specific accents. While effective, this approach offers limited interpretability and controllability, as embeddings also encode traits such as timbre and emotion. In this study, we analyze the interaction between speaker embeddings and linguistically motivated phonological rules in accented speech synthesis. Using American and British English as a case study, we implement rules for flapping, rhoticity, and vowel correspondences. We propose the phoneme shift rate (PSR), a novel metric quantifying how strongly embeddings preserve or override rule-based transformations. Experiments show that combining rules…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Language Development and Disorders