Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti; Matteo Prandi; Federico Pierucci; Francesco Giarrusso; Marcantonio Bracale Syrnikov; Marcello Galisai; Vincenzo Suriani; Olga Sorokoletova; Federico Sartore; Daniele Nardi

arXiv:2511.15304·cs.CL·January 19, 2026

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Piercosma Bisconti, Matteo Prandi, Federico Pierucci, Francesco Giarrusso, Marcantonio Bracale Syrnikov, Marcello Galisai, Vincenzo Suriani, Olga Sorokoletova, Federico Sartore, Daniele Nardi

PDF

Open Access

TL;DR

This paper demonstrates that adversarial poetry can effectively bypass safety measures in large language models, revealing a fundamental vulnerability that persists across various models and safety training approaches.

Contribution

It introduces poetic prompts as a universal single-turn jailbreak technique, showing their high success rates and transferability across multiple domains and model types.

Findings

01

Poetic prompts achieve up to 90% attack success rate.

02

Poetic attacks transfer across CBRN, manipulation, cyber-offence, domains.

03

Poetry-based prompts outperform non-poetic baselines in bypassing safety mechanisms.

Abstract

We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 MLCommons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. Outputs are evaluated using an ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education