SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems
Hadi Abdullah, Kevin Warren, Vincent Bindschaedler, Nicolas Papernot,, and Patrick Traynor

TL;DR
This paper reviews attacks on speech and speaker recognition systems, highlighting their unique vulnerabilities and the challenges in transferring attacks, emphasizing the need for further research in defenses.
Contribution
It provides a systematic taxonomy of attacks on speech and speaker recognition systems and demonstrates the limited transferability of these attacks.
Findings
Attacks rarely transfer between models.
End-to-end architecture affects attack strategies.
Significant work needed for effective defenses.
Abstract
Speech and speaker recognition systems are employed in a variety of applications, from personal assistants to telephony surveillance and biometric authentication. The wide deployment of these systems has been made possible by the improved accuracy in neural networks. Like other systems based on neural networks, recent research has demonstrated that speech and speaker recognition systems are vulnerable to attacks using manipulated inputs. However, as we demonstrate in this paper, the end-to-end architecture of speech and speaker systems and the nature of their inputs make attacks and defenses against them substantially different than those in the image space. We demonstrate this first by systematizing existing research in this space and providing a taxonomy through which the community can evaluate future work. We then demonstrate experimentally that attacks against these models almost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
