Benchmarking LLAMA Model Security Against OWASP Top 10 For LLM Applications

Nourin Shahin; Izzat Alsmadi

arXiv:2601.19970·cs.CR·January 29, 2026

Benchmarking LLAMA Model Security Against OWASP Top 10 For LLM Applications

Nourin Shahin, Izzat Alsmadi

PDF

Open Access

TL;DR

This paper benchmarks the security of various Llama models against OWASP Top 10 threats, revealing that smaller, specialized models can outperform larger ones in threat detection accuracy and response safety.

Contribution

It introduces a comprehensive security benchmarking framework for Llama models, including an open-source dataset and analysis of model size versus security effectiveness.

Findings

01

Llama-Guard-3-1B achieved 76% detection accuracy with low latency.

02

Base models like Llama-3.1-8B failed to detect threats (0% accuracy).

03

Smaller models often outperform larger ones in security tasks.

Abstract

As large language models (LLMs) move from research prototypes to enterprise systems, their security vulnerabilities pose serious risks to data privacy and system integrity. This study benchmarks various Llama model variants against the OWASP Top 10 for LLM Applications framework, evaluating threat detection accuracy, response safety, and computational overhead. Using the FABRIC testbed with NVIDIA A30 GPUs, we tested five standard Llama models and five Llama Guard variants on 100 adversarial prompts covering ten vulnerability categories. Our results reveal significant differences in security performance: the compact Llama-Guard-3-1B model achieved the highest detection rate of 76% with minimal latency (0.165s per test), whereas base models such as Llama-3.1-8B failed to detect threats (0% accuracy) despite longer inference times (0.754s). We observe an inverse relationship between model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Web Application Security Vulnerabilities