Detecting Prompt Injection Attacks Against Application Using Classifiers

Safwan Shaheer; G. M. Refatul Islam; Mohammad Rafid Hamid; Md. Abrar Faiaz Khan; Md. Omar Faruk; Yaseen Nur

arXiv:2512.12583·cs.CR·December 16, 2025

Detecting Prompt Injection Attacks Against Application Using Classifiers

Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid, Md. Abrar Faiaz Khan, Md. Omar Faruk, Yaseen Nur

PDF

Open Access

TL;DR

This paper develops and evaluates machine learning classifiers to detect prompt injection attacks in web applications, enhancing security measures for large language model integrations.

Contribution

It introduces a dataset for prompt injection detection and compares multiple classifiers to improve attack identification.

Findings

01

LSTM and neural networks outperform traditional models

02

The dataset augmentation enhances detection accuracy

03

Classifiers effectively identify malicious prompts

Abstract

Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Network Security and Intrusion Detection · Security and Verification in Computing