SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models

Wenhan Yao; Fen Xiao; Xiarun Chen; Jia Liu; YongQiang He; Weiping Wen

arXiv:2506.08346·cs.SD·June 11, 2025

SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models

Wenhan Yao, Fen Xiao, Xiarun Chen, Jia Liu, YongQiang He, Weiping Wen

PDF

Open Access

TL;DR

This paper introduces SPBA, a novel backdoor attack on speech classification models utilizing speech prompts generated by a large language model, demonstrating high effectiveness and diversity of triggers.

Contribution

The paper presents SPBA, a new speech backdoor attack leveraging speech elements and large language models to generate diverse triggers, enhancing attack success.

Findings

01

SPBA achieves high attack success rates on speech classification tasks.

02

Diverse triggers increase poisoning effectiveness and attack complexity.

03

The proposed method outperforms existing backdoor techniques in speech models.

Abstract

Deep speech classification tasks, including keyword spotting and speaker verification, are vital in speech-based human-computer interaction. Recently, the security of these technologies has been revealed to be susceptible to backdoor attacks. Specifically, attackers use noisy disruption triggers and speech element triggers to produce poisoned speech samples that train models to become vulnerable. However, these methods typically create only a limited number of backdoors due to the inherent constraints of the trigger function. In this paper, we propose that speech backdoor attacks can strategically focus on speech elements such as timbre and emotion, leveraging the Speech Large Language Model (SLLM) to generate diverse triggers. Increasing the number of triggers may disproportionately elevate the poisoning rate, resulting in higher attack costs and a lower success rate per trigger. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Emotion and Mood Recognition

MethodsFocus