Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models

Badhan Chandra Das; Md Tasnim Jawad; Joaquin Molto; M. Hadi Amini; Yanzhao Wu

arXiv:2601.05339·cs.CR·January 12, 2026

Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models

Badhan Chandra Das, Md Tasnim Jawad, Joaquin Molto, M. Hadi Amini, Yanzhao Wu

PDF

Open Access

TL;DR

This paper identifies security vulnerabilities in Multi-modal Large Language Models (MLLMs), introduces a novel multi-turn jailbreaking attack, and proposes a defense mechanism called FragGuard, validated through extensive experiments on various models.

Contribution

The paper presents a new multi-turn jailbreaking attack and a fragment-optimized defense mechanism, FragGuard, to enhance security in MLLMs, along with comprehensive experimental evaluation.

Findings

01

The multi-turn jailbreaking attack effectively exploits MLLM vulnerabilities.

02

FragGuard significantly reduces success rates of jailbreaking attacks.

03

Experimental results demonstrate improved security of MLLMs with FragGuard.

Abstract

In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial Intelligence (GenAI) research. These highly intelligent models, capable of performing multi-modal tasks with high accuracy, are also severely susceptible to carefully launched security attacks, such as jailbreaking attacks, which can manipulate model behavior and bypass safety constraints. This paper introduces MJAD-MLLMs, a holistic framework that systematically analyzes the proposed Multi-turn Jailbreaking Attacks and multi-LLM-based defense techniques for MLLMs. In this paper, we make three original contributions. First, we introduce a novel multi-turn jailbreaking attack to exploit the vulnerabilities of the MLLMs under multi-turn prompting. Second, we propose a novel fragment-optimized and multi-LLM defense mechanism, called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Computational and Text Analysis Methods