The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models
Yupeng Chen, Junchi Yu, Aoxi Liu, Philip Torr, Adel Bibi

TL;DR
This paper investigates how textual jailbreak vulnerabilities can transfer to audio in omni-models due to modality alignment, revealing that text-based attacks can effectively compromise audio modalities and serve as strong baselines for red-teaming.
Contribution
It introduces the concept of the alignment curse, analyzes cross-modality jailbreak transfer, and empirically demonstrates the effectiveness of text-transferred audio jailbreaks in omni-models.
Findings
Text-transferred audio jailbreaks perform comparably or better than audio-based ones.
Strong modality alignment can propagate textual vulnerabilities to audio.
Text-transferred attacks remain effective under strict audio-only access scenarios.
Abstract
Recent advances in end-to-end trained omni-models have significantly improved multimodal understanding. At the same time, safety red-teaming has expanded beyond text to encompass audio-based jailbreak attacks. However, an important bridge between textual and audio jailbreaks remains underexplored. In this work, we study the cross-modality transfer of jailbreak attacks from text to audio, motivated by the semantic similarity between the two modalities and the maturity of textual jailbreak methods. We first analyze the connection between modality alignment and cross-modality jailbreak transfer, showing that strong alignment can inadvertently propagate textual vulnerabilities to the audio modality, which we term the alignment curse. Guided by this analysis, we conduct an empirical evaluation of textual jailbreaks, text-transferred audio jailbreaks, and existing audio-based jailbreaks on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques
