Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection
Meng Chen, Kun Wang, Li Lu, Jiaheng Zhang, Tianwei Zhang

TL;DR
This paper uncovers a new security vulnerability in large audio-language models, demonstrating how imperceptible audio prompts can hijack model behavior with high success rates across various systems.
Contribution
The authors introduce AudioHijack, a novel framework for generating imperceptible adversarial audio that can systematically hijack large audio-language models under realistic constraints.
Findings
Achieves 79%-96% success rate in hijacking models on unseen contexts.
Demonstrates hijacking of commercial voice agents like Mistral AI and Microsoft Azure.
Reveals critical security vulnerabilities in LALMs.
Abstract
Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however, expands the attack surface beyond text and introduces vulnerabilities in the continuous, high-dimensional audio channel. While prior work studied audio jailbreaks, the security risks of malicious audio injection and downstream behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically analyze this threat, we propose \textit{AudioHijack}, a general framework that generates context-agnostic and imperceptible adversarial audio to hijack LALMs. \textit{AudioHijack} employs sampling-based gradient estimation for end-to-end optimization across diverse models, bypassing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
