Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

Piyush Jaiswal; Aaditya Pratap; Shreyansh Saraswati; Harsh Kasyap; Somanath Tripathy

arXiv:2602.22242·cs.CR·February 27, 2026

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy

PDF

Open Access

TL;DR

This paper assesses the vulnerability of various open-source Large Language Models to prompt injection and jailbreak attacks, revealing significant behavioral differences and evaluating lightweight defenses that are often bypassed.

Contribution

It provides a comprehensive evaluation of prompt-based attack vulnerabilities across multiple LLMs and tests simple inference-time defenses, highlighting their limitations.

Findings

01

Models show diverse responses to prompt injections.

02

Lightweight defenses can mitigate simple attacks but fail against complex prompts.

03

Behavioral responses vary significantly across different LLMs.

Abstract

Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques