A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only   Using Native Speech Corpora

Kentaro Onda; Joonyong Park; Nobuaki Minematsu; Daisuke Saito

arXiv:2407.11370·cs.SD·July 17, 2024

A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora

Kentaro Onda, Joonyong Park, Nobuaki Minematsu, Daisuke Saito

PDF

Open Access

TL;DR

This paper introduces a GSLM-based method to simulate foreign accentuation in speech by using native speech corpora, enabling natural-sounding accent transfer with controllable degree.

Contribution

It presents a novel approach to simulate foreign accents by leveraging GSLM and native speech data, advancing accent modeling techniques.

Findings

01

Synthesized accent speech is highly natural.

02

Accent degree is controllable.

03

Method outperforms naive baseline approaches.

Abstract

We propose a method of simulating the human process of foreign accentuation using Generative Spoken Language Model (GSLM) only with native speech corpora. When one listens to spoken words of a foreign language and repeats them, the repeated speech is often with the accent of that listener's L1. This is said to be because the spoken words are mentally represented as a sequence of phonological units of the L1, and those units are used for oral reproduction. We simulate this process by inputting speech of language A into GSLM of language B to add B's accent onto the input speech. The process of running ASR of the L1 for foreign input speech and giving the ASR result to TTS of the L1 can be viewed as a naive implementation of this approach. The results of our experiments show that the synthesized accent of the output speech is highly natural, compared to real samples of A generated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis