DefenSee: Dissecting Threat from Sight and Text -- A Multi-View Defensive Pipeline for Multi-modal Jailbreaks

Zihao Wang; Kar Wai Fok; Vrizlynn L. L. Thing

arXiv:2512.01185·cs.CR·December 19, 2025

DefenSee: Dissecting Threat from Sight and Text -- A Multi-View Defensive Pipeline for Multi-modal Jailbreaks

Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing

PDF

Open Access

TL;DR

DefenSee is a multi-modal defense method that improves the robustness of large language models against jailbreaks by using cross-modal consistency checks, significantly reducing attack success rates while maintaining benign performance.

Contribution

This paper introduces DefenSee, a novel multi-modal black-box defense technique that enhances MLLM security against jailbreaks through image variant transcription and cross-modal consistency checks.

Findings

01

Reduces jailbreak attack success rate to below 1.70% on MiniGPT4.

02

Outperforms prior defenses in robustness while preserving benign task performance.

03

Effective against coordinated multi-modal jailbreaks.

Abstract

Multi-modal large language models (MLLMs), capable of processing text, images, and audio, have been widely adopted in various AI applications. However, recent MLLMs integrating images and text remain highly vulnerable to coordinated jailbreaks. Existing defenses primarily focus on the text, lacking robust multi-modal protection. As a result, studies indicate that MLLMs are more susceptible to malicious or unsafe instructions, unlike their text-only counterparts. In this paper, we proposed DefenSee, a robust and lightweight multi-modal black-box defense technique that leverages image variants transcription and cross-modal consistency checks, mimicking human judgment. Experiments on popular multi-modal jailbreak and benign datasets show that DefenSee consistently enhances MLLM robustness while better preserving performance on benign tasks compared to SOTA defenses. It reduces the ASR of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Topic Modeling