Antelope: Potent and Concealed Jailbreak Attack Strategy

Xin Zhao; Xiaojun Chen; Haoyu Gao

arXiv:2412.08156·cs.CR·December 12, 2024

Antelope: Potent and Concealed Jailbreak Attack Strategy

Xin Zhao, Xiaojun Chen, Haoyu Gao

PDF

Open Access

TL;DR

Antelope is a novel, covert jailbreak attack strategy that exploits semantic confusion and transferability to bypass security filters in generative models, effectively generating NSFW content despite safeguards.

Contribution

The paper introduces Antelope, a robust and covert attack method that improves search efficiency and attack stealthiness by leveraging semantic concept confusion and transferability.

Findings

01

Outperforms existing attack baselines across multiple defenses

02

Effectively generates NSFW content while evading detection

03

Successfully penetrates online black-box services

Abstract

Due to the remarkable generative potential of diffusion-based models, numerous researches have investigated jailbreak attacks targeting these frameworks. A particularly concerning threat within image models is the generation of Not-Safe-for-Work (NSFW) content. Despite the implementation of security filters, numerous efforts continue to explore ways to circumvent these safeguards. Current attack methodologies primarily encompass adversarial prompt engineering or concept obfuscation, yet they frequently suffer from slow search efficiency, conspicuous attack characteristics and poor alignment with targets. To overcome these challenges, we propose Antelope, a more robust and covert jailbreak attack strategy designed to expose security vulnerabilities inherent in generative models. Specifically, Antelope leverages the confusion of sensitive concepts with similar ones, facilitates searches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCybercrime and Law Enforcement Studies · Digital and Cyber Forensics · Terrorism, Counterterrorism, and Political Violence