Voice Conversion using Convolutional Neural Networks

Shariq Mobin; Joan Bruna

arXiv:1610.08927·stat.ML·October 28, 2016·5 cites

Voice Conversion using Convolutional Neural Networks

Shariq Mobin, Joan Bruna

PDF

Open Access 1 Repo

TL;DR

This paper explores voice conversion by transforming pitch and timbre using convolutional neural networks, aiming to better mimic individual speaker identities.

Contribution

It introduces a neural network-based approach to manipulate both pitch and timbre for voice conversion, advancing previous methods that focused mainly on pitch.

Findings

01

Preliminary results show promising voice conversion quality.

02

Neural networks can effectively learn speaker-specific features.

03

The approach improves speaker similarity in converted voices.

Abstract

The human auditory system is able to distinguish the vocal source of thousands of speakers, yet not much is known about what features the auditory system uses to do this. Fourier Transforms are capable of capturing the pitch and harmonic structure of the speaker but this alone proves insufficient at identifying speakers uniquely. The remaining structure, often referred to as timbre, is critical to identifying speakers but we understood little about it. In this paper we use recent advances in neural networks in order to manipulate the voice of one speaker into another by transforming not only the pitch of the speaker, but the timbre. We review generative models built with neural networks as well as architectures for creating neural networks that learn analogies. Our preliminary results converting voices from one speaker to another are encouraging.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShariqM/smcnn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies