Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Zhixue Song; Boyan Han; Yiwei Wang; and Chi Zhang

arXiv:2605.07250·cs.CV·May 11, 2026

Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment

Zhixue Song, Boyan Han, Yiwei Wang, and Chi Zhang

PDF

TL;DR

This paper uncovers a vulnerability in vision-based MLLMs where lowering image resolution causes safety defenses to fail, due to cognitive overload, and proposes a mitigation strategy to improve security.

Contribution

It identifies the impact of visual degradation on safety defenses in MLLMs and introduces a structured pipeline approach to mitigate this risk.

Findings

01

Safety defenses deteriorate sharply with image resolution degradation.

02

Degraded inputs cause cognitive overload, diverting attention from safety auditing.

03

The proposed structured pipeline mitigates safety risks caused by visual degradation.

Abstract

Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal that the safety defenses of SOTA models deteriorate sharply as resolution degrades, surprisingly persisting even when text remains legible. We attribute this to ``Cognitive Overload'', hypothesizing that the effort required to decipher degraded inputs diverts attentional resources from safety auditing. This phenomenon is consistent across various visual perturbations, including noise and geometric distortion. To address this, we propose a simple ``Structured Cognitive Offloading'' strategy that mitigates these risks by enforcing a serialized pipeline to decouple visual transcription from safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.