Voice Conversion with Conditional SampleRNN

Cong Zhou; Michael Horgan; Vivek Kumar; Cristina Vasco; Dan Darcy

arXiv:1808.08311·cs.SD·October 30, 2018

Voice Conversion with Conditional SampleRNN

Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco, Dan Darcy

PDF

TL;DR

This paper introduces a novel voice conversion method using a conditioned SampleRNN model that preserves speech content while changing speaker identity, enabling flexible, many-to-many voice conversion without parallel data.

Contribution

The paper presents a new conditioned SampleRNN approach for voice conversion that outperforms traditional methods and does not require parallel data.

Findings

01

Outperforms conventional VC methods in subjective evaluations

02

Enables many-to-many voice conversion without parallel data

03

Preserves speech content while changing speaker identity

Abstract

Here we present a novel approach to conditioning the SampleRNN generative model for voice conversion (VC). Conventional methods for VC modify the perceived speaker identity by converting between source and target acoustic features. Our approach focuses on preserving voice content and depends on the generative network to learn voice style. We first train a multi-speaker SampleRNN model conditioned on linguistic features, pitch contour, and speaker identity using a multi-speaker speech corpus. Voice-converted speech is generated using linguistic features and pitch contour extracted from the source speaker, and the target speaker identity. We demonstrate that our system is capable of many-to-many voice conversion without requiring parallel data, enabling broad applications. Subjective evaluation demonstrates that our approach outperforms conventional VC methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.