GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending   Against Prompt Injection Attacks

Rongchang Li; Minjie Chen; Chang Hu; Han Chen; Wenpeng; Xing; Meng Han

arXiv:2409.19521·cs.CR·October 1, 2024

GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks

Rongchang Li, Minjie Chen, Chang Hu, Han Chen, Wenpeng, Xing, Meng Han

PDF

Open Access 1 Models

TL;DR

GenTel-Safe introduces a comprehensive framework with a novel attack detection method and an extensive benchmark to evaluate and improve defenses against prompt injection attacks in large language models.

Contribution

It presents GenTel-Shield for attack detection and GenTel-Bench for evaluation, addressing vulnerabilities and gaps in existing safety mechanisms for LLMs.

Findings

01

GenTel-Shield achieves state-of-the-art detection rates.

02

Existing safety guardrails are vulnerable to prompt injection.

03

The benchmark includes over 84,800 attack scenarios.

Abstract

Large Language Models (LLMs) like GPT-4, LLaMA, and Qwen have demonstrated remarkable success across a wide range of applications. However, these models remain inherently vulnerable to prompt injection attacks, which can bypass existing safety mechanisms, highlighting the urgent need for more robust attack detection methods and comprehensive evaluation benchmarks. To address these challenges, we introduce GenTel-Safe, a unified framework that includes a novel prompt injection attack detection method, GenTel-Shield, along with a comprehensive evaluation benchmark, GenTel-Bench, which compromises 84812 prompt injection attacks, spanning 3 major categories and 28 security scenarios. To prove the effectiveness of GenTel-Shield, we evaluate it together with vanilla safety guardrails against the GenTel-Bench dataset. Empirically, GenTel-Shield can achieve state-of-the-art attack detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
GenTelLab/gentelshield-v1
model· 427 dl· ♡ 6
427 dl♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Electrostatic Discharge in Electronics

MethodsLinear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Attention Is All You Need · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding