CourtGuard: A Local, Multiagent Prompt Injection Classifier

Isaac Wu; Michael Maslowski

arXiv:2510.19844·cs.CR·October 24, 2025

CourtGuard: A Local, Multiagent Prompt Injection Classifier

Isaac Wu, Michael Maslowski

PDF

Open Access

TL;DR

CourtGuard introduces a multiagent, court-like system for classifying prompt injections in LLMs, emphasizing lower false positives and advancing multiagent defense strategies despite some limitations in detection accuracy.

Contribution

This paper presents CourtGuard, a novel multiagent prompt injection classifier that uses a court-like system to improve false positive rates in prompt injection detection.

Findings

01

Lower false positive rate than the Direct Detector

02

Highlights importance of adversarial and benign scenario consideration

03

Advances multiagent system use in prompt injection defense

Abstract

As large language models (LLMs) become integrated into various sensitive applications, prompt injection, the use of prompting to induce harmful behaviors from LLMs, poses an ever increasing risk. Prompt injection attacks can cause LLMs to leak sensitive data, spread misinformation, and exhibit harmful behaviors. To defend against these attacks, we propose CourtGuard, a locally-runnable, multiagent prompt injection classifier. In it, prompts are evaluated in a court-like multiagent LLM system, where a "defense attorney" model argues the prompt is benign, a "prosecution attorney" model argues the prompt is a prompt injection, and a "judge" model gives the final classification. CourtGuard has a lower false positive rate than the Direct Detector, an LLM as-a-judge. However, CourtGuard is generally a worse prompt injection detector. Nevertheless, this lower false positive rate highlights the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling