Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

Jennifer Williams; Jason Fong; Erica Cooper; Junichi Yamagishi

arXiv:2105.01573·eess.AS·June 29, 2021·1 cites

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of disentangled phone and speaker representations from multilingual and monolingual VQ-VAE models for speech manipulation tasks like voice transformation and privacy masking.

Contribution

It introduces a novel approach to manipulate speech content and speaker identity using VQ-VAE representations, including a technique for content concealment.

Findings

01

VQ representations are effective for speech manipulation tasks.

02

Mixing speaker representations can create new voices.

03

Content masking preserves speaker identity and intelligibility.

Abstract

This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copy-synthesis, voice transformation, linguistic code-switching, and content-based privacy masking. From these tasks, we reflect on how disentangled phone and speaker representations can be used to manipulate speech in a meaningful way. Our experiments demonstrate that the VQ representations are suitable for these tasks, including creating new voices by mixing speaker representations together. We also present our novel technique to conceal the content of targeted words within an utterance by manipulating phone VQ codes, while retaining speaker identity and intelligibility of surrounding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rhoposit/multilingual_VQVAE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection · Speech and Audio Processing

MethodsVQ-VAE