BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage

Kalyan Nakka; Nitesh Saxena

arXiv:2506.02479·cs.CR·March 4, 2026

BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage

Kalyan Nakka, Nitesh Saxena

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces BitBypass, a novel black-box attack that exploits bitstream camouflage to effectively jailbreak aligned large language models, bypassing safety measures and generating harmful content.

Contribution

It presents a new attack method based on bitstream camouflage, revealing vulnerabilities in current safety alignment techniques of LLMs.

Findings

01

BitBypass successfully bypasses safety alignment in five state-of-the-art LLMs.

02

It outperforms existing jailbreak methods in stealthiness and success rate.

03

The attack exploits fundamental data representation, not prompt engineering.

Abstract

The inherent risk of generating harmful and unsafe content by Large Language Models (LLMs), has highlighted the need for their safety alignment. Various techniques like supervised fine-tuning, reinforcement learning from human feedback, and red-teaming were developed for ensuring the safety alignment of LLMs. However, the robustness of these aligned LLMs is always challenged by adversarial attacks that exploit unexplored and underlying vulnerabilities of the safety alignment. In this paper, we develop a novel black-box jailbreak attack, called BitBypass, that leverages hyphen-separated bitstream camouflage for jailbreaking aligned LLMs. This represents a new direction in jailbreaking by exploiting fundamental information representation of data as continuous bits, rather than leveraging prompt engineering or adversarial manipulations. Our evaluation of five state-of-the-art LLMs, namely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kalyan-nakka/bitbypass
noneOfficial

Datasets

kalyannakka/PhishyContent
dataset· 2 dl
2 dl

Videos

BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis

MethodsLLaMA