Accent Conversion with Articulatory Representations
Yashish M. Siriwardena, Nathan Swedlow, Audrey Howard, Evan Gitterman,, Dan Darcy, Carol Espy-Wilson, Andrea Fanelli

TL;DR
This paper proposes using articulatory speech representations, extracted via inversion, to enhance accent conversion from non-native to American English, demonstrating improved effectiveness through multi-task learning.
Contribution
It introduces a novel approach of integrating articulatory representations with phonetic posteriograms using multi-task learning for accent conversion.
Findings
Articulatory representations improve accent conversion quality.
Multi-task learning enhances the acoustic model's performance.
Objective and subjective evaluations confirm the effectiveness.
Abstract
Conversion of non-native accented speech to native (American) English has a wide range of applications such as improving intelligibility of non-native speech. Previous work on this domain has used phonetic posteriograms as the target speech representation to train an acoustic model which is then used to extract a compact representation of input speech for accent conversion. In this work, we introduce the idea of using an effective articulatory speech representation, extracted from an acoustic-to-articulatory speech inversion system, to improve the acoustic model used in accent conversion. The idea to incorporate articulatory representations originates from their ability to well characterize accents in speech. To incorporate articulatory representations with conventional phonetic posteriograms, a multi-task learning based acoustic model is proposed. Objective and subjective evaluations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Subtitles and Audiovisual Media
