Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation
Youngjoon Lee, Taehyun Park, Yunho Lee, Jinu Gong, Joonhyuk Kang

TL;DR
This paper examines prompt injection threats in federated military LLMs, identifying vulnerabilities and proposing a combined technical and policy framework for mitigation.
Contribution
It highlights four specific vulnerabilities in federated military LLMs and introduces a collaborative framework with technical and policy measures to mitigate prompt injection risks.
Findings
Identified four vulnerabilities: data leakage, free-rider exploitation, system disruption, misinformation.
Proposed a human-AI collaborative framework with technical and policy countermeasures.
Emphasized the importance of joint AI-human policy development for security.
Abstract
Federated Learning (FL) is increasingly being adopted in military collaborations to develop Large Language Models (LLMs) while preserving data sovereignty. However, prompt injection attacks-malicious manipulations of input prompts-pose new threats that may undermine operational security, disrupt decision-making, and erode trust among allies. This perspective paper highlights four vulnerabilities in federated military LLMs: secret data leakage, free-rider exploitation, system disruption, and misinformation spread. To address these risks, we propose a human-AI collaborative framework with both technical and policy countermeasures. On the technical side, our framework uses red/blue team wargaming and quality assurance to detect and mitigate adversarial behaviors of shared LLM weights. On the policy side, it promotes joint AI-human policy development and verification of security protocols.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
