PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

Quanchen Zou; Zonghao Ying; Moyang Chen; Wenzhuo Xu; Yisong Xiao; Yakai Li; Deyue Zhang; Dongdong Yang; Zhao Liu; Xiangzheng Zhang

arXiv:2507.21540·cs.CR·April 9, 2026

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

Quanchen Zou, Zonghao Ying, Moyang Chen, Wenzhuo Xu, Yisong Xiao, Yakai Li, Deyue Zhang, Dongdong Yang, Zhao Liu, Xiangzheng Zhang

PDF

TL;DR

This paper introduces a novel jailbreak method for large vision-language models that exploits their reasoning process by decomposing harmful instructions into benign visual components, revealing a significant vulnerability.

Contribution

The authors propose a ROP-inspired framework that effectively bypasses safety measures by leveraging the compositional reasoning of LVLMs, demonstrating its success on multiple benchmarks.

Findings

01

Achieves over 0.90 attack success rate on SafeBench

02

Outperforms existing baselines significantly

03

Reveals a critical vulnerability in LVLM safety defenses

Abstract

The increasing sophistication of large vision-language models (LVLMs) has been accompanied by advances in safety alignment mechanisms designed to prevent harmful content generation. However, these defenses remain vulnerable to sophisticated adversarial attacks. Existing jailbreak methods typically rely on direct and semantically explicit prompts, overlooking subtle vulnerabilities in how LVLMs compose information over multiple reasoning steps. In this paper, we propose a novel and effective jailbreak framework inspired by Return-Oriented Programming (ROP) techniques from software security. Our approach decomposes a harmful instruction into a sequence of individually benign visual gadgets. A carefully engineered textual prompt directs the sequence of inputs, prompting the model to integrate the benign visual gadgets through its reasoning process to produce a coherent and harmful output.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.