Securing Large Language Models (LLMs) from Prompt Injection Attacks

Omar Farooq Khan Suri; John McCrae

arXiv:2512.01326·cs.CR·December 2, 2025

Securing Large Language Models (LLMs) from Prompt Injection Attacks

Omar Farooq Khan Suri, John McCrae

PDF

Open Access

TL;DR

This paper evaluates the robustness of a fine-tuning approach called JATMO against prompt injection attacks on large language models, revealing its partial effectiveness and highlighting the need for layered defenses.

Contribution

It adapts and tests the JATMO fine-tuning method against a new genetic attack framework, HOUYI, providing insights into its strengths and limitations.

Findings

01

JATMO reduces attack success rates but does not fully prevent prompt injections.

02

Adversaries can bypass defenses using multilingual cues or code-related prompts.

03

There is a trade-off between model performance and vulnerability to injections.

Abstract

Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-following ability to make it perform malicious tasks. Recent work has proposed JATMO, a task-specific fine-tuning approach that trains non-instruction-tuned base models to perform a single function, thereby reducing susceptibility to adversarial instructions. In this study, we evaluate the robustness of JATMO against HOUYI, a genetic attack framework that systematically mutates and optimizes adversarial prompts. We adapt HOUYI by introducing custom fitness scoring, modified mutation logic, and a new harness for local model testing, enabling a more accurate assessment of defense effectiveness. We fine-tuned LLaMA 2-7B, Qwen1.5-4B, and Qwen1.5-0.5B models under the JATMO methodology and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Artificial Intelligence in Healthcare and Education