A Comparison of Discrete and Soft Speech Units for Improved Voice   Conversion

Benjamin van Niekerk; Marc-Andr\'e Carbonneau; Julian Za\"idi; Mathew; Baas; Hugo Seut\'e; Herman Kamper

arXiv:2111.02392·eess.AS·June 9, 2022

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

Benjamin van Niekerk, Marc-Andr\'e Carbonneau, Julian Za\"idi, Mathew, Baas, Hugo Seut\'e, Herman Kamper

PDF

2 Repos 2 Models

TL;DR

This paper compares discrete and soft speech units for voice conversion, showing that soft units better preserve linguistic content and improve speech naturalness by modeling uncertainty.

Contribution

It introduces soft speech units that predict distributions over discrete units, enhancing content preservation and speech quality in voice conversion.

Findings

01

Discrete units remove speaker info but cause mispronunciations.

02

Soft units improve intelligibility and naturalness.

03

Modeling uncertainty captures more linguistic content.

Abstract

The goal of voice conversion is to transform source speech into a target voice, keeping the content unchanged. In this paper, we focus on self-supervised representation learning for voice conversion. Specifically, we compare discrete and soft speech units as input features. We find that discrete representations effectively remove speaker information but discard some linguistic content - leading to mispronunciations. As a solution, we propose soft speech units. To learn soft units, we predict a distribution over discrete speech units. By modeling uncertainty, soft units capture more content information, improving the intelligibility and naturalness of converted speech. Samples available at https://ubisoft-laforge.github.io/speech/soft-vc/. Code available at https://github.com/bshall/soft-vc/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.