MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue, Shuai Wang, Wanxing Wang, Pengcheng Zhu, Mengxiao Bi,, Haizhou Li

TL;DR
This paper introduces MacST, a novel approach for multi-accent speech synthesis using text transliteration and multilingual TTS models, enabling effective accent conversion while maintaining speaker identity and content.
Contribution
The study presents a new method combining text transliteration with TTS for creating multi-accent speech datasets and accent conversion, leveraging large language models for transliteration.
Findings
Effective accent conversion demonstrated through subjective evaluations.
Synthetic dataset improves accent conversion quality.
Method works for both native and non-native speakers.
Abstract
In accented voice conversion or accent conversion, we seek to convert the accent in speech from one another while preserving speaker identity and semantic content. In this study, we formulate a novel method for creating multi-accented speech samples, thus pairs of accented speech samples by the same speaker, through text transliteration for training accent conversion systems. We begin by generating transliterated text with Large Language Models (LLMs), which is then fed into multilingual TTS models to synthesize accented English speech. As a reference system, we built a sequence-to-sequence model on the synthetic parallel corpus for accent conversion. We validated the proposed method for both native and non-native English speakers. Subjective and objective evaluations further validate our dataset's effectiveness in accent conversion studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems
