Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora

Kentaro Onda; Keisuke Imoto; Satoru Fukayama; Daisuke Saito; Nobuaki Minematsu

arXiv:2505.16191·cs.SD·May 23, 2025

Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora

Kentaro Onda, Keisuke Imoto, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu

PDF

Open Access

TL;DR

This paper enhances foreign accent simulation by integrating duration modification into a discrete token-based resynthesis method, enabling more accurate replication of durational accents in synthesized speech using only native speech data.

Contribution

It introduces a novel duration modification technique to improve foreign accent simulation, addressing a key limitation of previous methods that could not reproduce durational accents.

Findings

01

Successfully replicates durational accents in synthesized speech

02

Improves robustness of ASR and listening materials against foreign accents

03

Enhances naturalness of simulated foreign accents

Abstract

Recently, a method for synthesizing foreign-accented speech only with native speech data using discrete tokens obtained from self-supervised learning (SSL) models was proposed. Considering limited availability of accented speech data, this method is expected to make it much easier to simulate foreign accents. By using the synthesized accented speech as listening materials for humans or training data for automatic speech recognition (ASR), both of them will acquire higher robustness against foreign accents. However, the previous method has a fatal flaw that it cannot reproduce duration-related accents. Durational accents are commonly seen when L2 speakers, whose native language has syllable-timed or mora-timed rhythm, speak stress-timed languages, such as English. In this paper, we integrate duration modification to the previous method to simulate foreign accents more accurately.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Voice and Speech Disorders