EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody
Coen Schoof, Stefanos Koffas, Mauro Conti, Stjepan Picek

TL;DR
This paper demonstrates that speaker identification systems are vulnerable to backdoor attacks using emotional prosody as triggers, and explores defenses like pruning to mitigate these attacks.
Contribution
First to investigate backdoor attacks on speaker identification DNNs using emotional prosody as inconspicuous triggers, and evaluates defense strategies.
Findings
Emotional triggers can effectively compromise SI systems.
Pruning reduces attack success rate by up to 40%.
Models are vulnerable to backdoor attacks using sad and neutral prosody.
Abstract
Speaker identification (SI) determines a speaker's identity based on their spoken utterances. Previous work indicates that SI deep neural networks (DNNs) are vulnerable to backdoor attacks. Backdoor attacks involve embedding hidden triggers in DNNs' training data, causing the DNN to produce incorrect output when these triggers are present during inference. This is the first work that explores SI DNNs' vulnerability to backdoor attacks using speakers' emotional prosody, resulting in dynamic, inconspicuous triggers. We conducted a parameter study using three different datasets and DNN architectures to determine the impact of emotions as backdoor triggers on the accuracy of SI systems. Additionally, we have explored the robustness of our attacks by applying defenses like pruning, STRIP-ViTA, and three popular preprocessing techniques: quantization, median filtering, and squeezing. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection
