MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent   Conversion

Sho Inoue; Shuai Wang; Wanxing Wang; Pengcheng Zhu; Mengxiao Bi,; Haizhou Li

arXiv:2409.09352·cs.SD·January 13, 2025

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion

Sho Inoue, Shuai Wang, Wanxing Wang, Pengcheng Zhu, Mengxiao Bi,, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces MacST, a novel approach for multi-accent speech synthesis using text transliteration and multilingual TTS models, enabling effective accent conversion while maintaining speaker identity and content.

Contribution

The study presents a new method combining text transliteration with TTS for creating multi-accent speech datasets and accent conversion, leveraging large language models for transliteration.

Findings

01

Effective accent conversion demonstrated through subjective evaluations.

02

Synthetic dataset improves accent conversion quality.

03

Method works for both native and non-native speakers.

Abstract

In accented voice conversion or accent conversion, we seek to convert the accent in speech from one another while preserving speaker identity and semantic content. In this study, we formulate a novel method for creating multi-accented speech samples, thus pairs of accented speech samples by the same speaker, through text transliteration for training accent conversion systems. We begin by generating transliterated text with Large Language Models (LLMs), which is then fed into multilingual TTS models to synthesize accented English speech. As a reference system, we built a sequence-to-sequence model on the synthetic parallel corpus for accent conversion. We validated the proposed method for both native and non-native English speakers. Subjective and objective evaluations further validate our dataset's effectiveness in accent conversion studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shinshoji01/macst-project-page
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems