The Echo Chamber Multi-Turn LLM Jailbreak
Ahmad Alobaid (NeuralTrust), Mart\'i Jord\`a Roca (NeuralTrust), Carlos Castillo (ICREA, UPF), Joan Vendrell (NeuralTrust)

TL;DR
The paper introduces Echo Chamber, a novel multi-turn attack method that exploits security vulnerabilities in large language models through carefully crafted interaction chains, highlighting the need for improved safety measures.
Contribution
It presents a new multi-turn jailbreaking attack method called Echo Chamber, demonstrating its effectiveness against leading LLMs and comparing it to existing attack techniques.
Findings
Echo Chamber outperforms existing multi-turn attacks.
It successfully bypasses safety guardrails in multiple LLMs.
The attack highlights security vulnerabilities in current LLM deployments.
Abstract
The availability of Large Language Models (LLMs) has led to a new generation of powerful chatbots that can be developed at relatively low cost. As companies deploy these tools, security challenges need to be addressed to prevent financial loss and reputational damage. A key security challenge is jailbreaking, the malicious manipulation of prompts and inputs to bypass a chatbot's safety guardrails. Multi-turn attacks are a relatively new form of jailbreaking involving a carefully crafted chain of interactions with a chatbot. We introduce Echo Chamber, a new multi-turn attack using a gradual escalation method. We describe this attack in detail, compare it to other multi-turn attacks, and demonstrate its performance against multiple state-of-the-art models through extensive evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Adversarial Robustness in Machine Learning · AI in Service Interactions
