Applying Pre-trained Multilingual BERT in Embeddings for Improved   Malicious Prompt Injection Attacks Detection

Md Abdur Rahman; Hossain Shahriar; Fan Wu; Alfredo Cuzzocrea

arXiv:2409.13331·cs.CL·September 23, 2024

Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection

Md Abdur Rahman, Hossain Shahriar, Fan Wu, Alfredo Cuzzocrea

PDF

Open Access

TL;DR

This paper explores using multilingual BERT embeddings combined with machine learning classifiers to detect malicious prompt injections in large language models, achieving high accuracy and providing insights into model limitations.

Contribution

It introduces a novel approach of applying multilingual BERT embeddings for classifying malicious prompts, improving detection performance over existing methods.

Findings

01

Multilingual BERT embeddings significantly improve malicious prompt detection accuracy.

02

Logistic Regression achieved 96.55% accuracy in classifying malicious prompts.

03

Analysis of incorrect predictions offers insights into model limitations.

Abstract

Large language models (LLMs) are renowned for their exceptional capabilities, and applying to a wide range of applications. However, this widespread use brings significant vulnerabilities. Also, it is well observed that there are huge gap which lies in the need for effective detection and mitigation strategies against malicious prompt injection attacks in large language models, as current approaches may not adequately address the complexity and evolving nature of these vulnerabilities in real-world applications. Therefore, this work focuses the impact of malicious prompt injection attacks which is one of most dangerous vulnerability on real LLMs applications. It examines to apply various BERT (Bidirectional Encoder Representations from Transformers) like multilingual BERT, DistilBert for classifying malicious prompts from legitimate prompts. Also, we observed how tokenizing the prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications

MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Attention Dropout · Dense Connections · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Layer Normalization