Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows
Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski,, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto, Barra-Chicote, Daniel Korzekwa

TL;DR
This paper introduces a flow-based accent conversion method that remaps phonetics, warps duration, and uses attention to align speech, effectively transforming accents while preserving speech naturalness and intelligibility.
Contribution
It presents a novel flow-based framework combining remapping, warping, and attention for non-parallel many-to-many accent conversion, improving over existing models.
Findings
Outperforms baseline in accent similarity
Enhances speech naturalness and intelligibility
Effectively handles non-parallel data
Abstract
Regional accents of the same language affect not only how words are pronounced (i.e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation. This paper investigates a novel flow-based approach to accent conversion using normalizing flows. The proposed approach revolves around three steps: remapping the phonetic conditioning, to better match the target accent, warping the duration of the converted speech, to better suit the target phonemes, and an attention mechanism that implicitly aligns source and target speech sequences. The proposed remap-warp-attend system enables adaptation of both phonetic and prosodic aspects of speech while allowing for source and converted speech signals to be of different lengths. Objective and subjective evaluations show that the proposed approach significantly outperforms a competitive CopyCat baseline model in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing
