Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers

Viet-Anh Nguyen; Shiqian Zhao; Gia Dao; Runyi Hu; Yi Xie; Luu Anh Tuan

arXiv:2505.16241·cs.CL·May 27, 2025

Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers

Viet-Anh Nguyen, Shiqian Zhao, Gia Dao, Runyi Hu, Yi Xie, Luu Anh Tuan

PDF

Open Access

TL;DR

This paper introduces SEAL, an innovative adaptive encryption-based jailbreak method that significantly enhances the ability to bypass safety mechanisms in large reasoning models, exposing potential security vulnerabilities.

Contribution

SEAL is a novel adaptive encryption pipeline that effectively overrides reasoning in LRMs, demonstrating a new approach to model jailbreaks with high success rates.

Findings

01

SEAL achieves an 80.8% success rate on GPT-o4-mini.

02

Outperforms state-of-the-art jailbreak methods by 27.2%.

03

Effectively bypasses built-in safety mechanisms in multiple reasoning models.

Abstract

Recently, Large Reasoning Models (LRMs) have demonstrated superior logical capabilities compared to traditional Large Language Models (LLMs), gaining significant attention. Despite their impressive performance, the potential for stronger reasoning abilities to introduce more severe security vulnerabilities remains largely underexplored. Existing jailbreak methods often struggle to balance effectiveness with robustness against adaptive safety mechanisms. In this work, we propose SEAL, a novel jailbreak attack that targets LRMs through an adaptive encryption pipeline designed to override their reasoning processes and evade potential adaptive alignment. Specifically, SEAL introduces a stacked encryption approach that combines multiple ciphers to overwhelm the models reasoning capabilities, effectively bypassing built-in safety mechanisms. To further prevent LRMs from developing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChaos-based Image/Signal Encryption · Computability, Logic, AI Algorithms

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Softmax