A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

Pedro H. Barcha Correia; Ryan W. Achjian; Diego E. G. Caetano de Oliveira; Ygor Acacio Maria; Victor Takashi Hayashi; Marcos Lopes; Charles Christian Miers; Marcos A. Simplicio Jr

arXiv:2601.22240·cs.CR·February 2, 2026

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

Pedro H. Barcha Correia, Ryan W. Achjian, Diego E. G. Caetano de Oliveira, Ygor Acacio Maria, Victor Takashi Hayashi, Marcos Lopes, Charles Christian Miers, Marcos A. Simplicio Jr

PDF

Open Access

TL;DR

This paper systematically reviews prompt injection mitigation strategies for large language models, extending NIST taxonomy, and provides a comprehensive catalog of defenses with effectiveness metrics, aiding future research and practical implementation.

Contribution

It extends NIST's taxonomy for LLM defenses, catalogs 88 studies with effectiveness data, and offers a standardized framework for future research and development.

Findings

01

Identified additional defense categories beyond NIST report

02

Extended NIST taxonomy with new defense classifications

03

Compiled a catalog of defenses with effectiveness and open-source info

Abstract

The rapid advancement and widespread adoption of generative artificial intelligence (GenAI) and large language models (LLMs) has been accompanied by the emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs can exploit LLMs, causing data leaks, unauthorized actions, or compromised outputs, for instance. As both offensive and defensive prompt injection techniques evolve quickly, a structured understanding of mitigation strategies becomes increasingly important. To address that, this work presents the first systematic literature review on prompt injection mitigation strategies, comprehending 88 studies. Building upon NIST's report on adversarial machine learning, this work contributes to the field through several avenues. First, it identifies studies beyond those documented in NIST's report and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques