Can we steal your vocal identity from the Internet?: Initial   investigation of cloning Obama's voice using GAN, WaveNet and low-quality   found data

Jaime Lorenzo-Trueba; Fuming Fang; Xin Wang; Isao Echizen; Junichi; Yamagishi; Tomi Kinnunen

arXiv:1803.00860·eess.AS·March 5, 2018

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi, Yamagishi, Tomi Kinnunen

PDF

TL;DR

This paper investigates the potential for cloning a public figure's voice using GAN and WaveNet models trained on low-quality, publicly available data, highlighting both progress and limitations in voice synthesis and spoofing detection.

Contribution

It introduces a speech enhancement system for low-quality data and evaluates its effectiveness in training voice synthesis models for cloning a public figure's voice.

Findings

01

Enhanced speech data improved SNR and perceptual quality.

02

Generated speech maintained naturalness but faced limitations.

03

Demonstrated feasibility of low-quality data for voice cloning.

Abstract

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database. However, speech synthesis and voice conversion paradigms that are not considered in the ASVspoof2015 database are appearing. Such examples include direct waveform modelling and generative adversarial networks. We also need to investigate the feasibility of training spoofing systems using only low-quality found data. For that purpose, we developed a generative adversarial network-based speech enhancement system that improves the quality of speech data found in publicly available sources. Using the enhanced data, we trained state-of-the-art text-to-speech and voice conversion models and evaluated them in terms of perceptual speech quality and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.