Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs

Mingyu Yu; Lana Liu; Zhehao Zhao; Wei Wang; Sujuan Qin

arXiv:2601.15698·cs.CV·January 23, 2026

Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs

Mingyu Yu, Lana Liu, Zhehao Zhao, Wei Wang, Sujuan Qin

PDF

Open Access

TL;DR

This paper introduces BVS, a framework that exploits vulnerabilities in multimodal models to generate harmful images, revealing significant safety gaps in current MLLMs.

Contribution

The paper presents a novel jailbreaking method for MLLMs that effectively probes their visual safety boundaries using a reconstruction-then-generation approach.

Findings

01

Achieves 98.21% success rate in jailbreaking GPT-5.

02

Exposes critical vulnerabilities in current MLLMs' visual safety.

03

Highlights need for improved safety alignment in multimodal models.

Abstract

The rapid advancement of Multimodal Large Language Models (MLLMs) has introduced complex security challenges, particularly at the intersection of textual and visual safety. While existing schemes have explored the security vulnerabilities of MLLMs, the investigation into their visual safety boundaries remains insufficient. In this paper, we propose Beyond Visual Safety (BVS), a novel image-text pair jailbreaking framework specifically designed to probe the visual safety boundaries of MLLMs. BVS employs a "reconstruction-then-generation" strategy, leveraging neutralized visual splicing and inductive recomposition to decouple malicious intent from raw inputs, thereby leading MLLMs to be induced into generating harmful images. Experimental results demonstrate that BVS achieves a remarkable jailbreak success rate of 98.21\% against GPT-5 (12 January 2026 release). Our findings expose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis