From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Yanxu Mao, Tiehan Cui, Peipei Liu, Datao You, Hongsong Zhu

TL;DR
This survey comprehensively reviews emerging jailbreak attack techniques and defense strategies in the evolving landscape of large language models, including multimodal models and intelligent agents, highlighting security challenges and future research directions.
Contribution
It provides an updated, structured analysis of attack and defense methods across LLM, MLLM, and agent paradigms, addressing gaps in existing surveys and proposing future research avenues.
Findings
Categorization of jailbreak techniques by impact and visibility
Analysis of datasets and evaluation metrics for attacks
Identification of limitations in current security strategies
Abstract
Large language models (LLMs) are rapidly evolving from single-modal systems to multimodal LLMs and intelligent agents, significantly expanding their capabilities while introducing increasingly severe security risks. This paper presents a systematic survey of the growing complexity of jailbreak attacks and corresponding defense mechanisms within the expanding LLM ecosystem. We first trace the developmental trajectory from LLMs to MLLMs and Agents, highlighting the core security challenges emerging at each stage. Next, we categorize mainstream jailbreak techniques from both the attack impact and visibility perspectives, and provide a comprehensive analysis of representative attack methods, related datasets, and evaluation metrics. On the defense side, we organize existing strategies based on response timing and technical approach, offering a structured understanding of their applicability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Information and Cyber Security
