Single and Multi-Speaker Cloned Voice Detection: From Perceptual to   Learned Features

Sarah Barrington; Romit Barua; Gautham Koorma; Hany Farid

arXiv:2307.07683·cs.SD·September 28, 2023

Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features

Sarah Barrington, Romit Barua, Gautham Koorma, Hany Farid

PDF

Open Access 1 Repo

TL;DR

This paper compares perceptual, spectral, and learned feature-based techniques for detecting synthetic cloned voices, demonstrating that learned features achieve high accuracy and robustness across single and multi-speaker scenarios.

Contribution

It introduces and evaluates three distinct approaches for cloned voice detection, highlighting the superior performance of learned features in accuracy and robustness.

Findings

01

Learned features achieve 0-4% equal error rate.

02

Methods are effective on both single and multi-speaker data.

03

Learned features show robustness to adversarial laundering.

Abstract

Synthetic-voice cloning technologies have seen significant advances in recent years, giving rise to a range of potential harms. From small- and large-scale financial fraud to disinformation campaigns, the need for reliable methods to differentiate real and synthesized voices is imperative. We describe three techniques for differentiating a real from a cloned voice designed to impersonate a specific person. These three approaches differ in their feature extraction stage with low-dimensional perceptual features offering high interpretability but lower accuracy, to generic spectral features, and end-to-end learned features offering less interpretability but higher accuracy. We show the efficacy of these approaches when trained on a single speaker's voice and when trained on multiple voices. The learned features consistently yield an equal error rate between 0% and 4%, and are reasonably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audio-df-ucb/clonedvoicedetection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Digital Media Forensic Detection