Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors

Yi Zhao; Youzhi Zhang

arXiv:2501.14250·cs.CL·November 14, 2025

Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors

Yi Zhao, Youzhi Zhang

PDF

Open Access 1 Repo

TL;DR

Siren is a learning-based framework that simulates real-world multi-turn jailbreak attacks on large language models, outperforming existing single-turn methods and providing insights for developing stronger defenses.

Contribution

The paper introduces Siren, a novel multi-turn attack framework that uses learning-based strategies to better mimic human jailbreak behaviors in LLMs, surpassing prior static or single-turn approaches.

Findings

01

Achieves 90% attack success rate against Gemini-1.5-Pro

02

Attains 70% success against GPT-4o with Mistral-7B attacker

03

Performs comparably to multi-turn GPT-4o-based attack with fewer turns

Abstract

Large language models (LLMs) are widely used in real-world applications, raising concerns about their safety and trustworthiness. While red-teaming with jailbreak prompts exposes the vulnerabilities of LLMs, current efforts focus primarily on single-turn attacks, overlooking the multi-turn strategies used by real-world adversaries. Existing multi-turn methods rely on static patterns or predefined logical chains, failing to account for the dynamic strategies during attacks. We propose Siren, a learning-based multi-turn attack framework designed to simulate real-world human jailbreak behaviors. Siren consists of three stages: (1) MiniMax-driven training set construction utilizing Turn-Level LLM feedback, (2) post-training attackers with supervised fine-tuning (SFT) and direct preference optimization (DPO), and (3) interactions between the attacking and target LLMs. Experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yiyiyizhao/siren
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCrime Patterns and Interventions · Digital and Cyber Forensics · Advanced Malware Detection Techniques

MethodsSinusoidal Representation Network · Sparse Evolutionary Training · Focus