Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

Zixuan Weng; Xiaolong Jin; Jinyuan Jia; Xiangyu Zhang

arXiv:2502.19820·cs.CL·March 31, 2025

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces FITD, a multi-turn jailbreak method inspired by psychological principles, which significantly increases the success rate of bypassing AI safety measures in large language models.

Contribution

We propose a novel multi-turn jailbreak technique based on foot-in-the-door principles, demonstrating high success rates and exposing vulnerabilities in current LLM alignment strategies.

Findings

01

Achieves 94% attack success rate across seven models

02

Outperforms existing jailbreak methods

03

Reveals vulnerabilities in multi-turn interactions

Abstract

Ensuring AI safety is crucial as large language models become increasingly integrated into real-world applications. A key challenge is jailbreak, where adversarial prompts bypass built-in safeguards to elicit harmful disallowed outputs. Inspired by psychological foot-in-the-door principles, we introduce FITD,a novel multi-turn jailbreak method that leverages the phenomenon where minor initial commitments lower resistance to more significant or more unethical transgressions. Our approach progressively escalates the malicious intent of user queries through intermediate bridge prompts and aligns the model's response by itself to induce toxic responses. Extensive experimental results on two jailbreak benchmarks demonstrate that FITD achieves an average attack success rate of 94% across seven widely used models, outperforming existing state-of-the-art methods. Additionally, we provide an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jinxiaolong1129/Foot-in-the-door-Jailbreak
pytorchOfficial

Videos

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Ethics and Social Impacts of AI