CodeChameleon: Personalized Encryption Framework for Jailbreaking Large   Language Models

Huijie Lv; Xiao Wang; Yuansen Zhang; Caishuang Huang; Shihan Dou,; Junjie Ye; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2402.16717·cs.CL·February 27, 2024·1 cites

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Huijie Lv, Xiao Wang, Yuansen Zhang, Caishuang Huang, Shihan Dou,, Junjie Ye, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CodeChameleon, a novel personalized encryption framework that effectively jailbreaks large language models by reformulating queries into encrypted code, achieving high success rates across multiple models including GPT-4.

Contribution

We propose a new encryption-based method for bypassing LLM safety mechanisms, with task reformulation and embedded decryption enabling successful attacks.

Findings

01

Achieves 86.6% ASR on GPT-4-1106

02

Outperforms existing jailbreak methods in success rate

03

Effective across 7 different LLMs

Abstract

Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, introducing a hypothesis for the safety mechanism of aligned LLMs: intent security recognition followed by response generation. Grounded in this hypothesis, we propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics. To elude the intent security recognition phase, we reformulate tasks into a code completion format, enabling users to encrypt queries using personalized encryption functions. To guarantee response generation functionality, we embed a decryption function within the instructions, which allows the LLM to decrypt and execute the encrypted queries successfully. We conduct extensive experiments on 7…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huizhang-l/codechameleon
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Privacy-Preserving Technologies in Data