Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs

Rao Ma; Mengjie Qian; Vyas Raina; Mark Gales; Kate Knill

arXiv:2505.14286·cs.CL·May 21, 2025

Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs

Rao Ma, Mengjie Qian, Vyas Raina, Mark Gales, Kate Knill

PDF

Open Access

TL;DR

This paper demonstrates that speech LLMs are vulnerable to universal acoustic adversarial attacks, which can manipulate or disable model outputs, highlighting the need for improved robustness in speech processing models.

Contribution

The study introduces a novel universal acoustic adversarial attack method for speech LLMs, including targeted and attribute-specific attacks, revealing critical vulnerabilities.

Findings

01

Vulnerabilities in Qwen2-Audio and Granite-Speech models

02

Universal attacks can disable or alter model outputs

03

Attribute-specific attacks enable fine-grained control

Abstract

The combination of pre-trained speech encoders with large language models has enabled the development of speech LLMs that can handle a wide range of spoken language processing tasks. While these models are powerful and flexible, this very flexibility may make them more vulnerable to adversarial attacks. To examine the extent of this problem, in this work we investigate universal acoustic adversarial attacks on speech LLMs. Here a fixed, universal, adversarial audio segment is prepended to the original input audio. We initially investigate attacks that cause the model to either produce no output or to perform a modified task overriding the original prompt. We then extend the nature of the attack to be selective so that it activates only when specific input attributes, such as a speaker gender or spoken language, are present. Inputs without the targeted attribute should be unaffected,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis