New Wide-Net-Casting Jailbreak Attacks Risk Large Models
Qiuchi Xiang, Haoxuan Qu, Hossein Rahmani, Jun Liu

TL;DR
This paper introduces the wide-net-casting jailbreak scenario, where querying multiple large models simultaneously poses significant safety risks, and presents a novel method achieving up to 100% success rate in such attacks.
Contribution
It identifies a new high-risk jailbreak scenario involving multiple models and develops a tailored attack method demonstrating its severity.
Findings
Jailbreak success rate reaches 100% in experiments
Wide-net-casting poses substantial safety risks
New tailored jailbreak method effectively exploits this scenario
Abstract
Jailbreak attacks on large models have drawn growing attention due to their close ties to societal safety. This work identifies a practical yet unexplored jailbreak scenario, the wide-net-casting scenario, where an adversary can query a group of large models instead of a single one to elicit harmful outputs. Our analysis reveals substantial yet previously overlooked safety risks under this scenario. As a key part of our analysis, we further develop a novel jailbreak method tailored to the wide-net-casting scenario. With this tailored method, the jailbreak success rate can even reach 100\% in some experiments when targeting the large models without additional safeguards, exposing wide-net-casting as a distinct, high-risk scenario that warrants attention in future evaluation and defense research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
