Remap, warp and attend: Non-parallel many-to-many accent conversion with   Normalizing Flows

Abdelhamid Ezzerg; Thomas Merritt; Kayoko Yanagisawa; Piotr Bilinski,; Magdalena Proszewska; Kamil Pokora; Renard Korzeniowski; Roberto; Barra-Chicote; Daniel Korzekwa

arXiv:2211.05850·eess.AS·November 14, 2022·SLT

Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows

Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski,, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto, Barra-Chicote, Daniel Korzekwa

PDF

Open Access

TL;DR

This paper introduces a flow-based accent conversion method that remaps phonetics, warps duration, and uses attention to align speech, effectively transforming accents while preserving speech naturalness and intelligibility.

Contribution

It presents a novel flow-based framework combining remapping, warping, and attention for non-parallel many-to-many accent conversion, improving over existing models.

Findings

01

Outperforms baseline in accent similarity

02

Enhances speech naturalness and intelligibility

03

Effectively handles non-parallel data

Abstract

Regional accents of the same language affect not only how words are pronounced (i.e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation. This paper investigates a novel flow-based approach to accent conversion using normalizing flows. The proposed approach revolves around three steps: remapping the phonetic conditioning, to better match the target accent, warping the duration of the converted speech, to better suit the target phonemes, and an attention mechanism that implicitly aligns source and target speech sequences. The proposed remap-warp-attend system enables adaptation of both phonetic and prosodic aspects of speech while allowing for source and converted speech signals to be of different lengths. Objective and subjective evaluations show that the proposed approach significantly outperforms a competitive CopyCat baseline model in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing