STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition
Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu,, Tanja Schultz, Bj\"orn W. Schuller

TL;DR
This paper introduces STAA-Net, a generator-based method for creating sparse, transferable adversarial examples to attack speech emotion recognition models efficiently and effectively, highlighting vulnerabilities in current SER systems.
Contribution
The paper presents a novel generator-based approach for sparse, transferable adversarial attacks on SER models, improving efficiency and transferability over existing gradient-based methods.
Findings
Successfully generates sparse adversarial examples on two SER datasets.
Demonstrates high transferability of attacks across different models.
Achieves efficient attack generation compared to iterative methods.
Abstract
Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Anomaly Detection Techniques and Applications
