Transforming acoustic characteristics to deceive playback spoofing   countermeasures of speaker verification systems

Fuming Fang; Junichi Yamagishi; Isao Echizen; Md Sahidullah; Tomi; Kinnunen

arXiv:1809.04274·cs.SD·September 14, 2018

Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems

Fuming Fang, Junichi Yamagishi, Isao Echizen, Md Sahidullah, Tomi, Kinnunen

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that transforming stolen speech to resemble genuine speech can significantly undermine playback spoofing countermeasures in speaker verification systems, highlighting a critical security vulnerability.

Contribution

It introduces a novel attack method using speech enhancement GANs to transform stolen speech, revealing vulnerabilities in current playback detection models.

Findings

01

Increased equal error rates in baseline playback detection models.

02

Degradation of speaker verification system performance.

03

Highlights urgent need for more robust countermeasures.

Abstract

Automatic speaker verification (ASV) systems use a playback detector to filter out playback attacks and ensure verification reliability. Since current playback detection models are almost always trained using genuine and played-back speech, it may be possible to degrade their performance by transforming the acoustic characteristics of the played-back speech close to that of the genuine speech. One way to do this is to enhance speech "stolen" from the target speaker before playback. We tested the effectiveness of a playback attack using this method by using the speech enhancement generative adversarial network to transform acoustic characteristics. Experimental results showed that use of this "enhanced stolen speech" method significantly increases the equal error rates for the baseline used in the ASVspoof 2017 challenge and for a light convolutional neural network-based method. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fangfm/lcnn
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Speech and Audio Processing