SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters
Yan Yang, Zeguan Xiao, Xin Lu, Hongru Wang, Xuetao Wei, Hailiang, Huang, Guanhua Chen, Yun Chen

TL;DR
SeqAR is a framework that automatically generates sequential jailbreak prompts to bypass LLM safety guardrails, achieving high success rates without human-crafted templates, and evaluates transferability and defenses.
Contribution
It introduces SeqAR, a novel method for automatic jailbreak prompt generation using open-source LLMs, improving attack success and transferability over prior approaches.
Findings
Achieves 88% success rate against GPT-3.5
Achieves 60% success rate against GPT-4
Demonstrates transferability across different LLMs
Abstract
The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SeqAR, a simple yet effective framework to design jailbreak prompts automatically. The SeqAR framework generates and optimizes multiple jailbreak characters and then applies sequential jailbreak characters in a single query to bypass the guardrails of the target LLM. Different from previous work which relies on proprietary LLMs or seed jailbreak templates crafted by human expertise, SeqAR can generate and optimize the jailbreak prompt in a cold-start scenario using open-sourced LLMs without any seed jailbreak templates. Experimental results show that SeqAR achieves attack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Information and Cyber Security · Digital and Cyber Forensics
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout
