Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation

Youngjoon Lee; Taehyun Park; Yunho Lee; Jinu Gong; Joonhyuk Kang

arXiv:2501.18416·cs.LG·May 5, 2026

Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation

Youngjoon Lee, Taehyun Park, Yunho Lee, Jinu Gong, Joonhyuk Kang

PDF

TL;DR

This paper examines prompt injection threats in federated military LLMs, identifying vulnerabilities and proposing a combined technical and policy framework for mitigation.

Contribution

It highlights four specific vulnerabilities in federated military LLMs and introduces a collaborative framework with technical and policy measures to mitigate prompt injection risks.

Findings

01

Identified four vulnerabilities: data leakage, free-rider exploitation, system disruption, misinformation.

02

Proposed a human-AI collaborative framework with technical and policy countermeasures.

03

Emphasized the importance of joint AI-human policy development for security.

Abstract

Federated Learning (FL) is increasingly being adopted in military collaborations to develop Large Language Models (LLMs) while preserving data sovereignty. However, prompt injection attacks-malicious manipulations of input prompts-pose new threats that may undermine operational security, disrupt decision-making, and erode trust among allies. This perspective paper highlights four vulnerabilities in federated military LLMs: secret data leakage, free-rider exploitation, system disruption, and misinformation spread. To address these risks, we propose a human-AI collaborative framework with both technical and policy countermeasures. On the technical side, our framework uses red/blue team wargaming and quality assurance to detect and mitigate adversarial behaviors of shared LLM weights. On the policy side, it promotes joint AI-human policy development and verification of security protocols.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.