Cognitive Overload: Jailbreaking Large Language Models with Overloaded   Logical Thinking

Nan Xu; Fei Wang; Ben Zhou; Bang Zheng Li; Chaowei Xiao; Muhao Chen

arXiv:2311.09827·cs.CL·March 1, 2024·2 cites

Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

Nan Xu, Fei Wang, Ben Zhou, Bang Zheng Li, Chaowei Xiao, Muhao Chen

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel black-box jailbreak method called cognitive overload that exploits LLMs' cognitive vulnerabilities, demonstrating its effectiveness across multiple models and highlighting the limitations of current defenses.

Contribution

It presents a new cognitive overload attack targeting LLMs' mental processes, and evaluates its success and defense challenges across different models.

Findings

01

Cognitive overload can successfully jailbreak various LLMs including Llama 2 and ChatGPT.

02

Existing defenses are largely ineffective against the proposed overload attack.

03

The attack exploits vulnerabilities in multilingual, veiled expression, and effect-to-cause reasoning.

Abstract

While large language models (LLMs) have demonstrated increasing power, they have also given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can provoke harmful or unethical responses from LLMs, even after safety alignment. In this paper, we investigate a novel category of jailbreak attacks specifically designed to target the cognitive structure and processes of LLMs. Specifically, we analyze the safety vulnerability of LLMs in the face of (1) multilingual cognitive overload, (2) veiled expression, and (3) effect-to-cause reasoning. Different from previous jailbreak attacks, our proposed cognitive overload is a black-box attack with no need for knowledge of model architecture or access to model weights. Experiments conducted on AdvBench and MasterKey reveal that various LLMs, including both popular open-source model Llama 2 and the proprietary model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning