Death by a Thousand Prompts: Open Model Vulnerability Analysis

Amy Chang; Nicholas Conley; Harish Santhanalakshmi Ganesan; Adam Swanda

arXiv:2511.03247·cs.CR·November 6, 2025

Death by a Thousand Prompts: Open Model Vulnerability Analysis

Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda

PDF

Open Access

TL;DR

This paper evaluates the security vulnerabilities of eight open-weight large language models, revealing significant multi-turn prompt injection and jailbreak risks that threaten safe deployment and require layered security measures.

Contribution

It provides a comprehensive adversarial testing framework and highlights systemic vulnerabilities in open-weight LLMs, emphasizing the need for security-focused design strategies.

Findings

01

Multi-turn attacks have success rates up to 92.78%.

02

Capability-focused models are more vulnerable than safety-oriented ones.

03

Open-weight models lack resilience across extended interactions.

Abstract

Open-weight models provide researchers and developers with accessible foundations for diverse downstream applications. We tested the safety and security postures of eight open-weight large language models (LLMs) to identify vulnerabilities that may impact subsequent fine-tuning and deployment. Using automated adversarial testing, we measured each model's resilience against single-turn and multi-turn prompt injection and jailbreak attacks. Our findings reveal pervasive vulnerabilities across all tested models, with multi-turn attacks achieving success rates between 25.86\% and 92.78\% -- representing a $2 \times$ to $10 \times$ increase over single-turn baselines. These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions. We assess that alignment strategies and lab priorities significantly influence resilience:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Information and Cyber Security