When Automatic Voice Disguise Meets Automatic Speaker Verification

Linlin Zheng; Jiakang Li; Meng Sun; Xiongwei Zhang; Thomas Fang Zheng

arXiv:2009.06863·eess.AS·September 16, 2020

When Automatic Voice Disguise Meets Automatic Speaker Verification

Linlin Zheng, Jiakang Li, Meng Sun, Xiongwei Zhang, Thomas Fang Zheng

PDF

Open Access

TL;DR

This paper investigates how automatic voice disguise techniques affect speaker verification systems and proposes a method to reverse some disguises, improving verification accuracy in real-world noisy scenarios.

Contribution

It introduces a novel approach to restore disguised voices using ASV score minimization, demonstrating effectiveness against pitch scaling and VTLN disguises, but highlighting challenges with voice conversion.

Findings

01

Restoration reduces EER from 30% to 7% for pitch scaling.

02

Effectively decreases EER from 34.3% to 18.5% for VTLN.

03

Restoration is less effective for voice conversion disguises.

Abstract

The technique of transforming voices in order to hide the real identity of a speaker is called voice disguise, among which automatic voice disguise (AVD) by modifying the spectral and temporal characteristics of voices with miscellaneous algorithms are easily conducted with softwares accessible to the public. AVD has posed great threat to both human listening and automatic speaker verification (ASV). In this paper, we have found that ASV is not only a victim of AVD but could be a tool to beat some simple types of AVD. Firstly, three types of AVD, pitch scaling, vocal tract length normalization (VTLN) and voice conversion (VC), are introduced as representative methods. State-of-the-art ASV methods are subsequently utilized to objectively evaluate the impact of AVD on ASV by equal error rates (EER). Moreover, an approach to restore disguised voice to its original version is proposed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing