Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering
Yi Ji, Runzhi Li, Baolei Mao

TL;DR
This paper introduces DMPI-PMHFE, a dual-channel detection framework combining a pretrained language model and heuristic features to effectively identify prompt injection attacks across various LLMs, improving security and robustness.
Contribution
The paper presents a novel dual-channel detection method that integrates semantic and structural features, enhancing prompt injection attack detection across multiple LLMs.
Findings
Outperforms existing detection methods in accuracy, recall, and F1-score.
Reduces attack success rates significantly across mainstream LLMs.
Demonstrates effectiveness on diverse benchmark datasets.
Abstract
With the widespread adoption of Large Language Models (LLMs), prompt injection attacks have emerged as a significant security threat. Existing defense mechanisms often face critical trade-offs between effectiveness and generalizability. This highlights the urgent need for efficient prompt injection detection methods that are applicable across a wide range of LLMs. To address this challenge, we propose DMPI-PMHFE, a dual-channel feature fusion detection framework. It integrates a pretrained language model with heuristic feature engineering to detect prompt injection attacks. Specifically, the framework employs DeBERTa-v3-base as a feature extractor to transform input text into semantic vectors enriched with contextual information. In parallel, we design heuristic rules based on known attack patterns to extract explicit structural features commonly observed in attacks. Features from both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
