Voice conversion with limited data and limitless data augmentations

Olga Slizovskaia; Jordi Janer; Pritish Chandna; Oscar Mayor

arXiv:2212.13581·cs.SD·December 29, 2022

Voice conversion with limited data and limitless data augmentations

Olga Slizovskaia, Jordi Janer, Pritish Chandna, Oscar Mayor

PDF

Open Access

TL;DR

This paper investigates the effectiveness of various data augmentation techniques, including novel audio transformations, to improve real-time voice conversion systems with limited data, demonstrating enhanced performance through comprehensive evaluations.

Contribution

It introduces new data augmentation methods based on audio and voice transformations and evaluates their impact on real-time voice conversion.

Findings

01

Augmentation techniques improve conversion quality in low-data scenarios

02

Novel augmentation methods outperform traditional ones in subjective tests

03

Both male and female target conversions benefit from the proposed augmentations

Abstract

Applying changes to an input speech signal to change the perceived speaker of speech to a target while maintaining the content of the input is a challenging but interesting task known as Voice conversion (VC). Over the last few years, this task has gained significant interest where most systems use data-driven machine learning models. Doing the conversion in a low-latency real-world scenario is even more challenging constrained by the availability of high-quality data. Data augmentations such as pitch shifting and noise addition are often used to increase the amount of data used for training machine learning based models for this task. In this paper we explore the efficacy of common data augmentation techniques for real-time voice conversion and introduce novel techniques for data augmentation based on audio and voice transformation effects as well. We evaluate the conversions for both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing