Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models
Badhan Chandra Das, Md Tasnim Jawad, Joaquin Molto, M. Hadi Amini, Yanzhao Wu

TL;DR
This paper identifies security vulnerabilities in Multi-modal Large Language Models (MLLMs), introduces a novel multi-turn jailbreaking attack, and proposes a defense mechanism called FragGuard, validated through extensive experiments on various models.
Contribution
The paper presents a new multi-turn jailbreaking attack and a fragment-optimized defense mechanism, FragGuard, to enhance security in MLLMs, along with comprehensive experimental evaluation.
Findings
The multi-turn jailbreaking attack effectively exploits MLLM vulnerabilities.
FragGuard significantly reduces success rates of jailbreaking attacks.
Experimental results demonstrate improved security of MLLMs with FragGuard.
Abstract
In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial Intelligence (GenAI) research. These highly intelligent models, capable of performing multi-modal tasks with high accuracy, are also severely susceptible to carefully launched security attacks, such as jailbreaking attacks, which can manipulate model behavior and bypass safety constraints. This paper introduces MJAD-MLLMs, a holistic framework that systematically analyzes the proposed Multi-turn Jailbreaking Attacks and multi-LLM-based defense techniques for MLLMs. In this paper, we make three original contributions. First, we introduce a novel multi-turn jailbreaking attack to exploit the vulnerabilities of the MLLMs under multi-turn prompting. Second, we propose a novel fragment-optimized and multi-LLM defense mechanism, called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Computational and Text Analysis Methods
