Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression

Jingyu Peng; Maolin Wang; Nan Wang; Jiatong Li; Yuchen Li; Yuyang Ye; Wanyu Wang; Pengyue Jia; Kai Zhang; Xiangyu Zhao

arXiv:2505.13527·cs.CL·April 27, 2026

Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression

Jingyu Peng, Maolin Wang, Nan Wang, Jiatong Li, Yuchen Li, Yuyang Ye, Wanyu Wang, Pengyue Jia, Kai Zhang, Xiangyu Zhao

PDF

TL;DR

This paper introduces LogiBreak, a logical expression-based method to bypass LLM safety restrictions by translating harmful prompts into formal logic, exposing vulnerabilities in current safety mechanisms.

Contribution

It presents a novel black-box jailbreak technique using logical translation to effectively evade LLM safety systems across multiple languages.

Findings

01

LogiBreak successfully bypasses safety constraints in multilingual tests.

02

The method exploits distributional gaps between natural language prompts and logical expressions.

03

Effective across various evaluation settings and linguistic contexts.

Abstract

Despite substantial advancements in aligning large language models (LLMs) with human values, current safety mechanisms remain susceptible to jailbreak attacks. We hypothesize that this vulnerability stems from distributional discrepancies between alignment-oriented prompts and malicious prompts. To investigate this, we introduce LogiBreak, a novel and universal black-box jailbreak method that leverages logical expression translation to circumvent LLM safety systems. By converting harmful natural language prompts into formal logical expressions, LogiBreak exploits the distributional gap between alignment data and logic-based inputs, preserving the underlying semantic intent and readability while evading safety constraints. We evaluate LogiBreak on a multilingual jailbreak dataset spanning three languages, demonstrating its effectiveness across various evaluation settings and linguistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.