Exploring Vulnerabilities and Protections in Large Language Models: A   Survey

Frank Weizhen Liu; Chenhui Hu

arXiv:2406.00240·cs.LG·June 4, 2024·3 cites

Exploring Vulnerabilities and Protections in Large Language Models: A Survey

Frank Weizhen Liu, Chenhui Hu

PDF

Open Access

TL;DR

This survey reviews security vulnerabilities in Large Language Models, focusing on prompt hacking and adversarial attacks, and discusses defense strategies to enhance their resilience against such threats.

Contribution

It provides a structured analysis of LLM vulnerabilities and evaluates existing defense mechanisms, offering insights into building more secure AI systems.

Findings

01

Prompt Injection and Jailbreaking attacks pose significant risks.

02

Data Poisoning and Backdoor attacks threaten model integrity.

03

Robust defense frameworks can mitigate these vulnerabilities.

Abstract

As Large Language Models (LLMs) increasingly become key components in various AI applications, understanding their security vulnerabilities and the effectiveness of defense mechanisms is crucial. This survey examines the security challenges of LLMs, focusing on two main areas: Prompt Hacking and Adversarial Attacks, each with specific types of threats. Under Prompt Hacking, we explore Prompt Injection and Jailbreaking Attacks, discussing how they work, their potential impacts, and ways to mitigate them. Similarly, we analyze Adversarial Attacks, breaking them down into Data Poisoning Attacks and Backdoor Attacks. This structured examination helps us understand the relationships between these vulnerabilities and the defense strategies that can be implemented. The survey highlights these security challenges and discusses robust defensive frameworks to protect LLMs against these threats.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning