Poisoned LangChain: Jailbreak LLMs by LangChain
Ziqiu Wang, Jun Liu, Shengkai Zhang, Yang Yang

TL;DR
This paper introduces Poisoned-LangChain, a novel indirect jailbreak attack method that exploits poisoned external knowledge bases to bypass LLM safety filters, demonstrating high success rates across multiple models.
Contribution
It is the first to propose the concept of indirect jailbreak via retrieval-augmented generation and designs a practical poisoning attack method using LangChain.
Findings
Poisoned-LangChain achieves success rates over 79% in tests.
The attack is effective across six different large language models.
It exposes vulnerabilities in retrieval-augmented generation for LLM security.
Abstract
With the development of natural language processing (NLP), large language models (LLMs) are becoming increasingly popular. LLMs are integrating more into everyday life, raising public concerns about their security vulnerabilities. Consequently, the security of large language models is becoming critically important. Currently, the techniques for attacking and defending against LLMs are continuously evolving. One significant method type of attack is the jailbreak attack, which designed to evade model safety mechanisms and induce the generation of inappropriate content. Existing jailbreak attacks primarily rely on crafting inducement prompts for direct jailbreaks, which are less effective against large models with robust filtering and high comprehension abilities. Given the increasing demand for real-time capabilities in large language models, real-time updates and iterations of new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForensic and Genetic Research · Digital Media Forensic Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · WordPiece · Softmax · Layer Normalization · Linear Warmup With Linear Decay · Byte Pair Encoding · Attention Dropout · Dropout
