Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models
Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari

TL;DR
This paper evaluates the safety vulnerabilities of five advanced audio multimodal models, revealing significant risks of harmful outputs and safety failures under various attack scenarios, highlighting the need for improved safety measures.
Contribution
It is the first comprehensive red teaming study focusing on the safety of audio large multimodal models across multiple attack settings.
Findings
Open-source audio LMMs have a 69.14% attack success rate on harmful audio questions.
Models exhibit safety vulnerabilities when distracted by non-speech audio noise.
Speech-specific jailbreaks achieve a 70.67% success rate on harmful query benchmarks.
Abstract
Large Multimodal Models (LMMs) have demonstrated the ability to interact with humans under real-world conditions by combining Large Language Models (LLMs) and modality encoders to align multimodal information (visual and auditory) with text. However, such models raise new safety challenges of whether models that are safety-aligned on text also exhibit consistent safeguards for multimodal inputs. Despite recent safety-alignment research on vision LMMs, the safety of audio LMMs remains under-explored. In this work, we comprehensively red team the safety of five advanced audio LMMs under three settings: (i) harmful questions in both audio and text formats, (ii) harmful questions in text format accompanied by distracting non-speech audio, and (iii) speech-specific jailbreaks. Our results under these settings demonstrate that open-source audio LMMs suffer an average attack success rate of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMusic and Audio Processing
MethodsALIGN
