Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Divyanshu Kumar; Shreyas Jena; Nitin Aravind Birur; Tanay Baswa; Sahil Agarwal; Prashanth Harshangi

arXiv:2510.20223·cs.CR·October 24, 2025

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi

PDF

Open Access

TL;DR

This paper demonstrates that simple perceptual transformations can effectively bypass safety filters in multimodal large language models, exposing significant vulnerabilities across vision and audio modalities through extensive adversarial testing.

Contribution

It systematically evaluates multimodal jailbreak techniques, revealing critical safety gaps and proposing the need for broader semantic reasoning for robust AI safety.

Findings

01

Perceptual transformations cause over 75% attack success rate on safety filters.

02

Visual keyword decomposition achieves up to 89% attack success in vision models.

03

Audio perturbations reveal provider-specific weaknesses with 25% success rate.

Abstract

Multimodal large language models (MLLMs) have achieved remarkable progress, yet remain critically vulnerable to adversarial attacks that exploit weaknesses in cross-modal processing. We present a systematic study of multimodal jailbreaks targeting both vision-language and audio-language models, showing that even simple perceptual transformations can reliably bypass state-of-the-art safety filters. Our evaluation spans 1,900 adversarial prompts across three high-risk safety categories harmful content, CBRN (Chemical, Biological, Radiological, Nuclear), and CSEM (Child Sexual Exploitation Material) tested against seven frontier models. We explore the effectiveness of attack techniques on MLLMs, including FigStep-Pro (visual keyword decomposition), Intelligent Masking (semantic obfuscation), and audio perturbations (Wave-Echo, Wave-Pitch, Wave-Speed). The results reveal severe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Topic Modeling