Harnessing Task Overload for Scalable Jailbreak Attacks on Large   Language Models

Yiting Dong; Guobin Shen; Dongcheng Zhao; Xiang He; Yi Zeng

arXiv:2410.04190·cs.CR·October 8, 2024

Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models

Yiting Dong, Guobin Shen, Dongcheng Zhao, Xiang He, Yi Zeng

PDF

Open Access 1 Models

TL;DR

This paper presents a scalable jailbreak attack on large language models that preempts safety policies by overloading their computational resources, revealing a vulnerability in current safety mechanisms.

Contribution

Introduces a novel resource-overload attack method that effectively bypasses LLM safety policies across various model sizes without requiring gradient access or manual prompt engineering.

Findings

01

High success rate in bypassing safety measures

02

Effective across different model scales

03

Highlights vulnerability in safety policy design

Abstract

Large Language Models (LLMs) remain vulnerable to jailbreak attacks that bypass their safety mechanisms. Existing attack methods are fixed or specifically tailored for certain models and cannot flexibly adjust attack strength, which is critical for generalization when attacking models of various sizes. We introduce a novel scalable jailbreak attack that preempts the activation of an LLM's safety policies by occupying its computational resources. Our method involves engaging the LLM in a resource-intensive preliminary task - a Character Map lookup and decoding process - before presenting the target instruction. By saturating the model's processing capacity, we prevent the activation of safety protocols when processing the subsequent instruction. Extensive experiments on state-of-the-art LLMs demonstrate that our method achieves a high success rate in bypassing safety measures without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
CTCT-CT2/changeway_guardrails
model· 10 dl· ♡ 2
10 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Digital and Cyber Forensics