Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers
Viet-Anh Nguyen, Shiqian Zhao, Gia Dao, Runyi Hu, Yi Xie, Luu Anh Tuan

TL;DR
This paper introduces SEAL, an innovative adaptive encryption-based jailbreak method that significantly enhances the ability to bypass safety mechanisms in large reasoning models, exposing potential security vulnerabilities.
Contribution
SEAL is a novel adaptive encryption pipeline that effectively overrides reasoning in LRMs, demonstrating a new approach to model jailbreaks with high success rates.
Findings
SEAL achieves an 80.8% success rate on GPT-o4-mini.
Outperforms state-of-the-art jailbreak methods by 27.2%.
Effectively bypasses built-in safety mechanisms in multiple reasoning models.
Abstract
Recently, Large Reasoning Models (LRMs) have demonstrated superior logical capabilities compared to traditional Large Language Models (LLMs), gaining significant attention. Despite their impressive performance, the potential for stronger reasoning abilities to introduce more severe security vulnerabilities remains largely underexplored. Existing jailbreak methods often struggle to balance effectiveness with robustness against adaptive safety mechanisms. In this work, we propose SEAL, a novel jailbreak attack that targets LRMs through an adaptive encryption pipeline designed to override their reasoning processes and evade potential adaptive alignment. Specifically, SEAL introduces a stacked encryption approach that combines multiple ciphers to overwhelm the models reasoning capabilities, effectively bypassing built-in safety mechanisms. To further prevent LRMs from developing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChaos-based Image/Signal Encryption · Computability, Logic, AI Algorithms
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Softmax
