From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem

Yanxu Mao; Tiehan Cui; Peipei Liu; Datao You; Hongsong Zhu

arXiv:2506.15170·cs.CR·August 4, 2025

From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem

Yanxu Mao, Tiehan Cui, Peipei Liu, Datao You, Hongsong Zhu

PDF

Open Access

TL;DR

This survey comprehensively reviews emerging jailbreak attack techniques and defense strategies in the evolving landscape of large language models, including multimodal models and intelligent agents, highlighting security challenges and future research directions.

Contribution

It provides an updated, structured analysis of attack and defense methods across LLM, MLLM, and agent paradigms, addressing gaps in existing surveys and proposing future research avenues.

Findings

01

Categorization of jailbreak techniques by impact and visibility

02

Analysis of datasets and evaluation metrics for attacks

03

Identification of limitations in current security strategies

Abstract

Large language models (LLMs) are rapidly evolving from single-modal systems to multimodal LLMs and intelligent agents, significantly expanding their capabilities while introducing increasingly severe security risks. This paper presents a systematic survey of the growing complexity of jailbreak attacks and corresponding defense mechanisms within the expanding LLM ecosystem. We first trace the developmental trajectory from LLMs to MLLMs and Agents, highlighting the core security challenges emerging at each stage. Next, we categorize mainstream jailbreak techniques from both the attack impact and visibility perspectives, and provide a comprehensive analysis of representative attack methods, related datasets, and evaluation metrics. On the defense side, we organize existing strategies based on response timing and technical approach, offering a structured understanding of their applicability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Information and Cyber Security