Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM   Architectures

Victoria Benjamin; Emily Braca; Israel Carter; Hafsa Kanchwala; Nava; Khojasteh; Charly Landow; Yi Luo; Caroline Ma; Anna Magarelli; Rachel Mirin,; Avery Moyer; Kayla Simpson; Amelia Skawinski; and Thomas Heverin

arXiv:2410.23308·cs.CR·November 1, 2024·3 cites

Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures

Victoria Benjamin, Emily Braca, Israel Carter, Hafsa Kanchwala, Nava, Khojasteh, Charly Landow, Yi Luo, Caroline Ma, Anna Magarelli, Rachel Mirin,, Avery Moyer, Kayla Simpson, Amelia Skawinski, and Thomas Heverin

PDF

Open Access

TL;DR

This paper systematically evaluates the vulnerability of 36 large language models to prompt injection attacks, revealing widespread susceptibility linked to model size and architecture, and highlights the need for robust defenses.

Contribution

It provides a comprehensive analysis of prompt injection vulnerabilities across diverse LLM architectures, identifying key factors influencing susceptibility and potential overlaps in attack techniques.

Findings

01

56% of prompt injection tests succeeded

02

Vulnerability correlates with model size and architecture

03

Distinct vulnerability profiles linked to model configurations

Abstract

This study systematically analyzes the vulnerability of 36 large language models (LLMs) to various prompt injection attacks, a technique that leverages carefully crafted prompts to elicit malicious LLM behavior. Across 144 prompt injection tests, we observed a strong correlation between model parameters and vulnerability, with statistical analyses, such as logistic regression and random forest feature analysis, indicating that parameter size and architecture significantly influence susceptibility. Results revealed that 56 percent of tests led to successful prompt injections, emphasizing widespread vulnerability across various parameter sizes, with clustering analysis identifying distinct vulnerability profiles associated with specific model configurations. Additionally, our analysis uncovered correlations between certain prompt injection techniques, suggesting potential overlaps in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Radiation Effects in Electronics · Smart Grid Security and Resilience

MethodsLogistic Regression