Attack and defense techniques in large language models: A survey and new   perspectives

Zhiyu Liao; Kang Chen; Yuanguo Lin; Kangkang Li; Yunxuan Liu; Hefeng; Chen; Xingwang Huang; Yuanhui Yu

arXiv:2505.00976·cs.CR·May 5, 2025

Attack and defense techniques in large language models: A survey and new perspectives

Zhiyu Liao, Kang Chen, Yuanguo Lin, Kangkang Li, Yunxuan Liu, Hefeng, Chen, Xingwang Huang, Yuanhui Yu

PDF

Open Access

TL;DR

This survey reviews attack and defense techniques for large language models, highlighting current methods, challenges, open problems, and future directions for enhancing their security and robustness.

Contribution

It provides a comprehensive classification of attack and defense strategies in LLMs and discusses open challenges and future research directions.

Findings

01

Adversarial prompt and optimized attacks pose significant threats.

02

Defense strategies include prevention and detection methods.

03

Challenges include adapting to evolving threats and resource constraints.

Abstract

Large Language Models (LLMs) have become central to numerous natural language processing tasks, but their vulnerabilities present significant security and ethical challenges. This systematic survey explores the evolving landscape of attack and defense techniques in LLMs. We classify attacks into adversarial prompt attack, optimized attacks, model theft, as well as attacks on application of LLMs, detailing their mechanisms and implications. Consequently, we analyze defense strategies, including prevention-based and detection-based defense methods. Although advances have been made, challenges remain to adapt to the dynamic threat landscape, balance usability with robustness, and address resource constraints in defense implementation. We highlight open problems, including the need for adaptive scalable defenses, explainable security techniques, and standardized evaluation frameworks. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Hate Speech and Cyberbullying Detection