Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs

Jiawen Wang; Pritha Gupta; Ivan Habernal; Eyke H\"ullermeier

arXiv:2505.14368·cs.CR·May 21, 2025

Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs

Jiawen Wang, Pritha Gupta, Ivan Habernal, Eyke H\"ullermeier

PDF

Open Access

TL;DR

This paper evaluates prompt injection vulnerabilities in 14 popular open-source LLMs, introducing a new metric and demonstrating effective attacks that cause models to generate harmful content, emphasizing the need for mitigation.

Contribution

It introduces the Attack Success Probability metric and proposes simple yet effective prompt injection attacks against open-source LLMs, revealing significant vulnerabilities.

Findings

01

Hypnotism attack achieves ~90% ASP on several models.

02

Ignore prefix attacks break all tested models with over 60% ASP.

03

Moderately well-known LLMs are more vulnerable.

Abstract

Recent studies demonstrate that Large Language Models (LLMs) are vulnerable to different prompt-based attacks, generating harmful content or sensitive information. Both closed-source and open-source LLMs are underinvestigated for these attacks. This paper studies effective prompt injection attacks against the $14$ most popular open-source LLMs on five attack benchmarks. Current metrics only consider successful attacks, whereas our proposed Attack Success Probability (ASP) also captures uncertainty in the model's response, reflecting ambiguity in attack feasibility. By comprehensively analyzing the effectiveness of prompt injection attacks, we propose a simple and effective hypnotism attack; results show that this attack causes aligned language models, including Stablelm2, Mistral, Openchat, and Vicuna, to generate objectionable behaviors, achieving around $90$ % ASP. They also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Digital Rights Management and Security · Access Control and Trust