Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models

Palash Nandi; Maithili Joshi; Tanmoy Chakraborty

arXiv:2507.13761·cs.CL·July 21, 2025

Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models

Palash Nandi, Maithili Joshi, Tanmoy Chakraborty

PDF

Open Access

TL;DR

This paper investigates how prompt design factors and internal model structures influence the ability to jailbreak visual language models, revealing vulnerabilities and proposing a skip-connection framework to enhance jailbreak success rates.

Contribution

It introduces a novel skip-connection framework within VLMs that significantly increases jailbreak success, highlighting complex vulnerabilities in multimodal prompts.

Findings

01

VLMs distinguish benign and harmful inputs well in unimodal settings

02

Multimodal contexts significantly degrade model safety

03

Skip-connection framework boosts jailbreak success rates

Abstract

Language models are highly sensitive to prompt formulations - small changes in input can drastically alter their output. This raises a critical question: To what extent can prompt sensitivity be exploited to generate inapt content? In this paper, we investigate how discrete components of prompt design influence the generation of inappropriate content in Visual Language Models (VLMs). Specifically, we analyze the impact of three key factors on successful jailbreaks: (a) the inclusion of detailed visual information, (b) the presence of adversarial examples, and (c) the use of positively framed beginning phrases. Our findings reveal that while a VLM can reliably distinguish between benign and harmful inputs in unimodal settings (text-only or image-only), this ability significantly degrades in multimodal contexts. Each of the three factors is independently capable of triggering a jailbreak,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Computational and Text Analysis Methods