Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems
Fuming Fang, Junichi Yamagishi, Isao Echizen, Md Sahidullah, Tomi, Kinnunen

TL;DR
This paper demonstrates that transforming stolen speech to resemble genuine speech can significantly undermine playback spoofing countermeasures in speaker verification systems, highlighting a critical security vulnerability.
Contribution
It introduces a novel attack method using speech enhancement GANs to transform stolen speech, revealing vulnerabilities in current playback detection models.
Findings
Increased equal error rates in baseline playback detection models.
Degradation of speaker verification system performance.
Highlights urgent need for more robust countermeasures.
Abstract
Automatic speaker verification (ASV) systems use a playback detector to filter out playback attacks and ensure verification reliability. Since current playback detection models are almost always trained using genuine and played-back speech, it may be possible to degrade their performance by transforming the acoustic characteristics of the played-back speech close to that of the genuine speech. One way to do this is to enhance speech "stolen" from the target speaker before playback. We tested the effectiveness of a playback attack using this method by using the speech enhancement generative adversarial network to transform acoustic characteristics. Experimental results showed that use of this "enhanced stolen speech" method significantly increases the equal error rates for the baseline used in the ASVspoof 2017 challenge and for a light convolutional neural network-based method. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Speech and Audio Processing
