The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models

Yupeng Chen; Junchi Yu; Aoxi Liu; Philip Torr; Adel Bibi

arXiv:2602.02557·cs.LG·February 4, 2026

The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models

Yupeng Chen, Junchi Yu, Aoxi Liu, Philip Torr, Adel Bibi

PDF

Open Access

TL;DR

This paper investigates how textual jailbreak vulnerabilities can transfer to audio in omni-models due to modality alignment, revealing that text-based attacks can effectively compromise audio modalities and serve as strong baselines for red-teaming.

Contribution

It introduces the concept of the alignment curse, analyzes cross-modality jailbreak transfer, and empirically demonstrates the effectiveness of text-transferred audio jailbreaks in omni-models.

Findings

01

Text-transferred audio jailbreaks perform comparably or better than audio-based ones.

02

Strong modality alignment can propagate textual vulnerabilities to audio.

03

Text-transferred attacks remain effective under strict audio-only access scenarios.

Abstract

Recent advances in end-to-end trained omni-models have significantly improved multimodal understanding. At the same time, safety red-teaming has expanded beyond text to encompass audio-based jailbreak attacks. However, an important bridge between textual and audio jailbreaks remains underexplored. In this work, we study the cross-modality transfer of jailbreak attacks from text to audio, motivated by the semantic similarity between the two modalities and the maturity of textual jailbreak methods. We first analyze the connection between modality alignment and cross-modality jailbreak transfer, showing that strong alignment can inadvertently propagate textual vulnerabilities to the audio modality, which we term the alignment curse. Guided by this analysis, we conduct an empirical evaluation of textual jailbreaks, text-transferred audio jailbreaks, and existing audio-based jailbreaks on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques