LIAR: Leveraging Inference Time Alignment (Best-of-N) to Jailbreak LLMs in Seconds

James Beetham; Souradip Chakraborty; Mengdi Wang; Furong Huang; Amrit Singh Bedi; Mubarak Shah

arXiv:2412.05232·cs.CL·July 8, 2025

LIAR: Leveraging Inference Time Alignment (Best-of-N) to Jailbreak LLMs in Seconds

James Beetham, Souradip Chakraborty, Mengdi Wang, Furong Huang, Amrit Singh Bedi, Mubarak Shah

PDF

Open Access

TL;DR

This paper introduces LIAR, a fast black-box attack method that exploits inference-time misalignment to efficiently jailbreak safety-aligned LLMs, significantly reducing attack time and complexity while maintaining high success rates.

Contribution

The paper presents LIAR, a novel inference-time sampling attack that is faster and more practical than existing methods, along with a new metric for measuring safety alignment strength.

Findings

01

LIAR achieves state-of-the-art success rates in jailbreak attacks.

02

Reduces attack perplexity by 10 times and time-to-attack from hours to seconds.

03

Provides a theoretical framework for quantifying safety alignment robustness.

Abstract

Jailbreak attacks expose vulnerabilities in safety-aligned LLMs by eliciting harmful outputs through carefully crafted prompts. Existing methods rely on discrete optimization or trained adversarial generators, but are slow, compute-intensive, and often impractical. We argue that these inefficiencies stem from a mischaracterization of the problem. Instead, we frame jailbreaks as inference-time misalignment and introduce LIAR (Leveraging Inference-time misAlignment to jailbReak), a fast, black-box, best-of- $N$ sampling attack requiring no training. LIAR matches state-of-the-art success rates while reducing perplexity by $10 \times$ and Time-to-Attack from hours to seconds. We also introduce a theoretical "safety net against jailbreaks" metric to quantify safety alignment strength and derive suboptimality bounds. Our work offers a simple yet effective tool for evaluating LLM robustness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law