Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification

Hasini Jayathilaka

arXiv:2511.12295·cs.CR·November 18, 2025

Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification

Hasini Jayathilaka

PDF

Open Access

TL;DR

This paper introduces a privacy-preserving framework for detecting prompt injection attacks on large language models using federated learning and embedding-based NLP classification, enabling effective detection without exposing raw data.

Contribution

It presents a novel federated learning approach for prompt injection detection that maintains user privacy while achieving performance comparable to centralized methods.

Findings

01

Federated approach preserves privacy effectively.

02

Detection performance comparable to centralized models.

03

Proof-of-concept for privacy-aware LLM security.

Abstract

Prompt injection attacks are an emerging threat to large language models (LLMs), enabling malicious users to manipulate outputs through carefully designed inputs. Existing detection approaches often require centralizing prompt data, creating significant privacy risks. This paper proposes a privacy-preserving prompt injection detection framework based on federated learning and embedding-based classification. A curated dataset of benign and adversarial prompts was encoded with sentence embedding and used to train both centralized and federated logistic regression models. The federated approach preserved privacy by sharing only model parameters across clients, while achieving detection performance comparable to centralized training. Results demonstrate that effective prompt injection detection is feasible without exposing raw data, making this one of the first explorations of federated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Topic Modeling · Authorship Attribution and Profiling