Attacking Speaker Recognition With Deep Generative Models
Wilson Cai, Anish Doshi, Rafael Valle

TL;DR
This paper explores the use of deep generative models, specifically GANs, to create spoofing attacks on speaker recognition systems, highlighting security vulnerabilities and proposing a semi-supervised attack method.
Contribution
It introduces a modified Wasserstein GAN for semi-supervised attack generation, capable of both targeted and untargeted spoofing on speaker recognition systems.
Findings
Samples from SampleRNN and WaveNet do not fool CNN-based systems
Modified Wasserstein GAN enables effective spoofing attacks
Raises security concerns in speaker authentication systems
Abstract
In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class being learned. Our semi-supervised learning method is able to perform both targeted and untargeted attacks, raising questions related to security in speaker authentication systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Adversarial Robustness in Machine Learning · Speech Recognition and Synthesis
MethodsMixture of Logistic Distributions · Convolution · Dilated Causal Convolution · WaveNet · Dogecoin Customer Service Number +1-833-534-1729
