Exploring Vulnerabilities and Protections in Large Language Models: A Survey
Frank Weizhen Liu, Chenhui Hu

TL;DR
This survey reviews security vulnerabilities in Large Language Models, focusing on prompt hacking and adversarial attacks, and discusses defense strategies to enhance their resilience against such threats.
Contribution
It provides a structured analysis of LLM vulnerabilities and evaluates existing defense mechanisms, offering insights into building more secure AI systems.
Findings
Prompt Injection and Jailbreaking attacks pose significant risks.
Data Poisoning and Backdoor attacks threaten model integrity.
Robust defense frameworks can mitigate these vulnerabilities.
Abstract
As Large Language Models (LLMs) increasingly become key components in various AI applications, understanding their security vulnerabilities and the effectiveness of defense mechanisms is crucial. This survey examines the security challenges of LLMs, focusing on two main areas: Prompt Hacking and Adversarial Attacks, each with specific types of threats. Under Prompt Hacking, we explore Prompt Injection and Jailbreaking Attacks, discussing how they work, their potential impacts, and ways to mitigate them. Similarly, we analyze Adversarial Attacks, breaking them down into Data Poisoning Attacks and Backdoor Attacks. This structured examination helps us understand the relationships between these vulnerabilities and the defense strategies that can be implemented. The survey highlights these security challenges and discusses robust defensive frameworks to protect LLMs against these threats.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning
