Poisoned LangChain: Jailbreak LLMs by LangChain

Ziqiu Wang; Jun Liu; Shengkai Zhang; Yang Yang

arXiv:2406.18122·cs.CL·June 27, 2024

Poisoned LangChain: Jailbreak LLMs by LangChain

Ziqiu Wang, Jun Liu, Shengkai Zhang, Yang Yang

PDF

Open Access

TL;DR

This paper introduces Poisoned-LangChain, a novel indirect jailbreak attack method that exploits poisoned external knowledge bases to bypass LLM safety filters, demonstrating high success rates across multiple models.

Contribution

It is the first to propose the concept of indirect jailbreak via retrieval-augmented generation and designs a practical poisoning attack method using LangChain.

Findings

01

Poisoned-LangChain achieves success rates over 79% in tests.

02

The attack is effective across six different large language models.

03

It exposes vulnerabilities in retrieval-augmented generation for LLM security.

Abstract

With the development of natural language processing (NLP), large language models (LLMs) are becoming increasingly popular. LLMs are integrating more into everyday life, raising public concerns about their security vulnerabilities. Consequently, the security of large language models is becoming critically important. Currently, the techniques for attacking and defending against LLMs are continuously evolving. One significant method type of attack is the jailbreak attack, which designed to evade model safety mechanisms and induce the generation of inappropriate content. Existing jailbreak attacks primarily rely on crafting inducement prompts for direct jailbreaks, which are less effective against large models with robust filtering and high comprehension abilities. Given the increasing demand for real-time capabilities in large language models, real-time updates and iterations of new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForensic and Genetic Research · Digital Media Forensic Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · WordPiece · Softmax · Layer Normalization · Linear Warmup With Linear Decay · Byte Pair Encoding · Attention Dropout · Dropout