EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional   Prosody

Coen Schoof; Stefanos Koffas; Mauro Conti; Stjepan Picek

arXiv:2408.01178·cs.CR·September 19, 2024

EmoBack: Backdoor Attacks Against Speaker Identification Using Emotional Prosody

Coen Schoof, Stefanos Koffas, Mauro Conti, Stjepan Picek

PDF

Open Access

TL;DR

This paper demonstrates that speaker identification systems are vulnerable to backdoor attacks using emotional prosody as triggers, and explores defenses like pruning to mitigate these attacks.

Contribution

First to investigate backdoor attacks on speaker identification DNNs using emotional prosody as inconspicuous triggers, and evaluates defense strategies.

Findings

01

Emotional triggers can effectively compromise SI systems.

02

Pruning reduces attack success rate by up to 40%.

03

Models are vulnerable to backdoor attacks using sad and neutral prosody.

Abstract

Speaker identification (SI) determines a speaker's identity based on their spoken utterances. Previous work indicates that SI deep neural networks (DNNs) are vulnerable to backdoor attacks. Backdoor attacks involve embedding hidden triggers in DNNs' training data, causing the DNN to produce incorrect output when these triggers are present during inference. This is the first work that explores SI DNNs' vulnerability to backdoor attacks using speakers' emotional prosody, resulting in dynamic, inconspicuous triggers. We conducted a parameter study using three different datasets and DNN architectures to determine the impact of emotions as backdoor triggers on the accuracy of SI systems. Additionally, we have explored the robustness of our attacks by applying defenses like pruning, STRIP-ViTA, and three popular preprocessing techniques: quantization, median filtering, and squeezing. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection