STAA-Net: A Sparse and Transferable Adversarial Attack for Speech   Emotion Recognition

Yi Chang; Zhao Ren; Zixing Zhang; Xin Jing; Kun Qian; Xi Shao; Bin Hu,; Tanja Schultz; Bj\"orn W. Schuller

arXiv:2402.01227·cs.SD·February 5, 2024·1 cites

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu,, Tanja Schultz, Bj\"orn W. Schuller

PDF

Open Access

TL;DR

This paper introduces STAA-Net, a generator-based method for creating sparse, transferable adversarial examples to attack speech emotion recognition models efficiently and effectively, highlighting vulnerabilities in current SER systems.

Contribution

The paper presents a novel generator-based approach for sparse, transferable adversarial attacks on SER models, improving efficiency and transferability over existing gradient-based methods.

Findings

01

Successfully generates sparse adversarial examples on two SER datasets.

02

Demonstrates high transferability of attacks across different models.

03

Achieves efficient attack generation compared to iterative methods.

Abstract

Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Anomaly Detection Techniques and Applications