Securing AI Agents Against Prompt Injection Attacks

Badrinath Ramakrishnan; Akshaya Balaji

arXiv:2511.15759·cs.CR·November 21, 2025

Securing AI Agents Against Prompt Injection Attacks

Badrinath Ramakrishnan, Akshaya Balaji

PDF

Open Access

TL;DR

This paper introduces a comprehensive benchmark and a multi-layered defense framework to mitigate prompt injection attacks in retrieval-augmented generation AI systems, significantly improving security without sacrificing performance.

Contribution

It provides the first extensive benchmark for prompt injection risks in RAG systems and proposes an effective multi-layered defense framework validated across multiple models.

Findings

01

Benchmark includes 847 adversarial test cases across five attack types.

02

Defense mechanisms reduce attack success rate from 73.2% to 8.7%.

03

Maintains 94.3% of baseline task performance.

Abstract

Retrieval-augmented generation (RAG) systems have become widely used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled AI agents and propose a multi-layered defense framework. Our benchmark includes 847 adversarial test cases across five attack categories: direct injection, context manipulation, instruction override, data exfiltration, and cross-context contamination. We evaluate three defense mechanisms: content filtering with embedding-based anomaly detection, hierarchical system prompt guardrails, and multi-stage response verification, across seven state-of-the-art language models. Our combined framework reduces successful attack rates from 73.2% to 8.7% while maintaining 94.3% of baseline task performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Security and Verification in Computing