Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models
Quanchen Zou, Moyang Chen, Zonghao Ying, Wenzhuo Xu, Yisong Xiao, Deyue Zhang, Dongdong Yang, Zhao Liu, Xiangzheng Zhang

TL;DR
This paper introduces Reasoning-Oriented Programming, a novel attack paradigm that chains benign visual inputs to induce harmful reasoning in large vision-language models, bypassing safety defenses.
Contribution
It formalizes the attack as a programming paradigm and develops ool{}, an automated framework to generate semantic gadgets that evade current safety alignments.
Findings
ool{} effectively bypasses safety alignment in 7 state-of-the-art LVLMs.
It outperforms existing baselines by 4.67% on open-source models.
It achieves a 9.50% improvement on commercial models.
Abstract
Large Vision-Language Models (LVLMs) undergo safety alignment to suppress harmful content. However, current defenses predominantly target explicit malicious patterns in the input representation, often overlooking the vulnerabilities inherent in compositional reasoning. In this paper, we identify a systemic flaw where LVLMs can be induced to synthesize harmful logic from benign premises. We formalize this attack paradigm as \textit{Reasoning-Oriented Programming}, drawing a structural analogy to Return-Oriented Programming in systems security. Just as ROP circumvents memory protections by chaining benign instruction sequences, our approach exploits the model's instruction-following capability to orchestrate a semantic collision of orthogonal benign inputs. We instantiate this paradigm via \tool{}, an automated framework that optimizes for \textit{semantic orthogonality} and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing
