Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Zijian Ling; Pingyi Hu; Xiuyong Gao; Xiaojing Ma; Man Zhou; Jun Feng; Songfeng Lu; Dongmei Zhang; Bin Benjamin Zhu

arXiv:2603.13847·cs.CR·March 17, 2026

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Zijian Ling, Pingyi Hu, Xiuyong Gao, Xiaojing Ma, Man Zhou, Jun Feng, Songfeng Lu, Dongmei Zhang, Bin Benjamin Zhu

PDF

Open Access

TL;DR

The paper introduces SWhisper, a practical covert acoustic channel that enables inaudible prompt injections into speech-driven LLMs, demonstrating high effectiveness and imperceptibility in real-world scenarios.

Contribution

SWhisper is the first framework to achieve robust, inaudible prompt-based attacks on speech-driven LLMs using commodity hardware under black-box conditions.

Findings

01

Achieves up to 0.94 non-refusal rate on commercial models

02

Demonstrates high transferability of jailbreak prompts

03

Injected prompts are perceptually indistinguishable from background sounds

Abstract

Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic channels. We present Sirens' Whisper (SWhisper), the first practical framework for covert prompt-based attacks against speech-driven LLMs under realistic black-box conditions using commodity hardware. SWhisper enables robust, inaudible delivery of arbitrary target baseband audio-including long and structured prompts-on commodity devices by encoding it into near-ultrasound waveforms that demodulate faithfully after acoustic transmission and microphone nonlinearity. This is achieved through a simple yet effective approach to modeling nonlinear channel characteristics across devices and environments, combined with lightweight channel-inversion pre-compensation. Building on this high-fidelity covert channel, we design a voice-aware jailbreak…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Speech and Audio Processing