Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy

TL;DR
This paper assesses the vulnerability of various open-source Large Language Models to prompt injection and jailbreak attacks, revealing significant behavioral differences and evaluating lightweight defenses that are often bypassed.
Contribution
It provides a comprehensive evaluation of prompt-based attack vulnerabilities across multiple LLMs and tests simple inference-time defenses, highlighting their limitations.
Findings
Models show diverse responses to prompt injections.
Lightweight defenses can mitigate simple attacks but fail against complex prompts.
Behavioral responses vary significantly across different LLMs.
Abstract
Large Language Models (LLMs) are widely deployed in real-world systems. Given their broader applicability, prompt engineering has become an efficient tool for resource-scarce organizations to adopt LLMs for their own purposes. At the same time, LLMs are vulnerable to prompt-based attacks. Thus, analyzing this risk has become a critical security requirement. This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques
