Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang; Mohammad Zandsalimy; Shanu Sushmita

arXiv:2605.03441·cs.CR·May 6, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

PDF

TL;DR

This paper reveals that harmful prompts can bypass LLM safety filters by encoding them as mathematical problems, exposing fundamental vulnerabilities in current safety mechanisms.

Contribution

It introduces a novel formal logic encoding method to systematically analyze and demonstrate safety gaps in large language models.

Findings

01

Encoding prompts as mathematical problems achieves 46-56% attack success.

02

Deep reformulation by helper LLM is crucial for attack effectiveness.

03

Newer models like GPT-5 are more robust but still vulnerable.

Abstract

Large language models (LLMs) employ safety mechanisms to prevent harmful outputs, yet these defenses primarily rely on semantic pattern matching. We show that encoding harmful prompts as coherent mathematical problems -- using formalisms such as set theory, formal logic, and quantum mechanics -- bypasses these filters at high rates, achieving 46%--56% average attack success across eight target models and two established benchmarks. Crucially, the effectiveness depends not on mathematical notation itself, but on whether a helper LLM deeply reformulates the harmful content into a genuine mathematical problem: rule-based encodings that apply mathematical formatting without such reformulation perform no better than unencoded baselines. We introduce a novel Formal Logic encoding that achieves attack success comparable to Set Theory, demonstrating that this vulnerability generalizes across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.