Accent Conversion with Articulatory Representations

Yashish M. Siriwardena; Nathan Swedlow; Audrey Howard; Evan Gitterman,; Dan Darcy; Carol Espy-Wilson; Andrea Fanelli

arXiv:2406.05947·eess.AS·June 11, 2024·Interspeech

Accent Conversion with Articulatory Representations

Yashish M. Siriwardena, Nathan Swedlow, Audrey Howard, Evan Gitterman,, Dan Darcy, Carol Espy-Wilson, Andrea Fanelli

PDF

Open Access

TL;DR

This paper proposes using articulatory speech representations, extracted via inversion, to enhance accent conversion from non-native to American English, demonstrating improved effectiveness through multi-task learning.

Contribution

It introduces a novel approach of integrating articulatory representations with phonetic posteriograms using multi-task learning for accent conversion.

Findings

01

Articulatory representations improve accent conversion quality.

02

Multi-task learning enhances the acoustic model's performance.

03

Objective and subjective evaluations confirm the effectiveness.

Abstract

Conversion of non-native accented speech to native (American) English has a wide range of applications such as improving intelligibility of non-native speech. Previous work on this domain has used phonetic posteriograms as the target speech representation to train an acoustic model which is then used to extract a compact representation of input speech for accent conversion. In this work, we introduce the idea of using an effective articulatory speech representation, extracted from an acoustic-to-articulatory speech inversion system, to improve the acoustic model used in accent conversion. The idea to incorporate articulatory representations originates from their ability to well characterize accents in speech. To incorporate articulatory representations with conventional phonetic posteriograms, a multi-task learning based acoustic model is proposed. Objective and subjective evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Subtitles and Audiovisual Media