Text is All You Need for Vision-Language Model Jailbreaking

Yihang Chen; Zhao Xu; Youyuan Jiang; Tianle Zheng; Cho-Jui Hsieh

arXiv:2602.00420·cs.CV·February 3, 2026

Text is All You Need for Vision-Language Model Jailbreaking

Yihang Chen, Zhao Xu, Youyuan Jiang, Tianle Zheng, Cho-Jui Hsieh

PDF

Open Access

TL;DR

This paper introduces Text-DJ, an attack exploiting LVLMs' OCR to bypass safety safeguards by presenting harmful queries as a grid of images, revealing a vulnerability in current safety measures.

Contribution

The work presents a novel jailbreak method that combines query decomposition and distraction to bypass safety filters in LVLMs using OCR-based adversarial inputs.

Findings

01

Successfully bypasses safety safeguards in state-of-the-art LVLMs

02

Reveals OCR vulnerabilities to multi-image adversarial inputs

03

Highlights need for improved defenses against fragmented multimodal attacks

Abstract

Large Vision-Language Models (LVLMs) are increasingly equipped with robust safety safeguards to prevent responses to harmful or disallowed prompts. However, these defenses often focus on analyzing explicit textual inputs or relevant visual scenes. In this work, we introduce Text-DJ, a novel jailbreak attack that bypasses these safeguards by exploiting the model's Optical Character Recognition (OCR) capability. Our methodology consists of three stages. First, we decompose a single harmful query into multiple and semantically related but more benign sub-queries. Second, we pick a set of distraction queries that are maximally irrelevant to the harmful query. Third, we present all decomposed sub-queries and distraction queries to the LVLM simultaneously as a grid of images, with the position of the sub-queries being middle within the grid. We demonstrate that this method successfully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications