Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion
Wen-Chin Huang, Tomoki Toda

TL;DR
This paper evaluates three recent methods for ground-truth-free foreign accent conversion using seq2seq and non-parallel VC models, revealing no clear overall best and highlighting the complexity of measuring accentedness.
Contribution
The study provides a comprehensive evaluation of ground-truth-free FAC methods, analyzes their effectiveness, and discusses the limitations of intelligibility metrics, with open-source implementation for reproducibility.
Findings
No single method outperformed others across all metrics
Intelligibility measures do not correlate well with perceived accentedness
Open-source implementation facilitates future research
Abstract
Foreign accent conversion (FAC) is a special application of voice conversion (VC) which aims to convert the accented speech of a non-native speaker to a native-sounding speech with the same speaker identity. FAC is difficult since the native speech from the desired non-native speaker to be used as the training target is impossible to collect. In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity. Our experimental evaluation results show that no single method was significantly better than the others in all evaluation axes, which is in contrast to conclusions drawn in previous studies. We also explain the effectiveness of these methods with the training input and output of the seq2seq model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Voice and Speech Disorders
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
