Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity

Bilal Saleh Husain

arXiv:2506.12685·cs.CR·June 17, 2025

Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity

Bilal Saleh Husain

PDF

Open Access

TL;DR

This paper introduces Alphabet Index Mapping (AIM), a novel adversarial attack that maximizes semantic dissimilarity to effectively jailbreak large language models like GPT-4, outperforming existing methods.

Contribution

The paper proposes AIM, a new attack method that balances semantic dissimilarity and simplicity, providing a deeper understanding of prompt manipulation for model jailbreaks.

Findings

01

AIM achieves a 94% attack success rate on GPT-4.

02

Semantic dissimilarity correlates inversely with attack success.

03

AIM outperforms FlipAttack and other methods on AdvBench subset.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their susceptibility to adversarial attacks, particularly jailbreaking, poses significant safety and ethical concerns. While numerous jailbreak methods exist, many suffer from computational expense, high token usage, or complex decoding schemes. Liu et al. (2024) introduced FlipAttack, a black-box method that achieves high attack success rates (ASR) through simple prompt manipulation. This paper investigates the underlying mechanisms of FlipAttack's effectiveness by analyzing the semantic changes induced by its flipping modes. We hypothesize that semantic dissimilarity between original and manipulated prompts is inversely correlated with ASR. To test this, we examine embedding space visualizations (UMAP, KDE) and cosine similarities for FlipAttack's modes. Furthermore, we introduce a novel adversarial attack,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property · Digital and Cyber Forensics

MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer · GPT-4