Hard to Read, Easy to Jailbreak: How Visual Degradation Bypasses MLLM Safety Alignment
Zhixue Song, Boyan Han, Yiwei Wang, and Chi Zhang

TL;DR
This paper uncovers a vulnerability in vision-based MLLMs where lowering image resolution causes safety defenses to fail, due to cognitive overload, and proposes a mitigation strategy to improve security.
Contribution
It identifies the impact of visual degradation on safety defenses in MLLMs and introduces a structured pipeline approach to mitigate this risk.
Findings
Safety defenses deteriorate sharply with image resolution degradation.
Degraded inputs cause cognitive overload, diverting attention from safety auditing.
The proposed structured pipeline mitigates safety risks caused by visual degradation.
Abstract
Recent advancements in visual context compression enable MLLMs to process ultra-long contexts efficiently by rendering text into images. However, we identify a critical vulnerability inherent to this paradigm: lowering image resolution inadvertently catalyzes jailbreaking. Our experiments reveal that the safety defenses of SOTA models deteriorate sharply as resolution degrades, surprisingly persisting even when text remains legible. We attribute this to ``Cognitive Overload'', hypothesizing that the effort required to decipher degraded inputs diverts attentional resources from safety auditing. This phenomenon is consistent across various visual perturbations, including noise and geometric distortion. To address this, we propose a simple ``Structured Cognitive Offloading'' strategy that mitigates these risks by enforcing a serialized pipeline to decouple visual transcription from safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
