Fooling End-to-end Speaker Verification by Adversarial Examples
Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet

TL;DR
This paper demonstrates that end-to-end speaker verification systems are vulnerable to adversarial examples, which can fool the system into misidentifying speakers without perceptible differences to humans.
Contribution
It introduces both white-box and black-box adversarial attack methods on end-to-end speaker verification models, highlighting their susceptibility to such attacks.
Findings
Adversarial examples significantly reduce system accuracy.
False-positive rates increase dramatically under attack.
White-box attacks are highly effective across datasets.
Abstract
Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attack. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable from the original examples by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on an end-to-end deep network that was either trained on YOHO or NTIMIT. We also present two black-box attacks: where the adversarial examples were generated with a system that was trained on YOHO, but the attack is on a system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
