HauntAttack: When Attack Follows Reasoning as a Shadow

Jingyuan Ma; Rui Li; Zheng Li; Junfeng Liu; Heming Xia; Lei Sha; Zhifang Sui

arXiv:2506.07031·cs.CR·October 24, 2025

HauntAttack: When Attack Follows Reasoning as a Shadow

Jingyuan Ma, Rui Li, Zheng Li, Junfeng Liu, Heming Xia, Lei Sha, Zhifang Sui

PDF

Open Access

TL;DR

HauntAttack is a black-box adversarial framework that embeds harmful instructions into reasoning questions, exposing safety vulnerabilities in large reasoning models with a high success rate of 70%.

Contribution

This paper introduces HauntAttack, a novel method for systematically attacking reasoning models by embedding harmful instructions, revealing safety risks in large reasoning models.

Findings

01

Average attack success rate of 70% across 11 models

02

Advanced safety-aligned models remain highly vulnerable

03

Significant improvement over previous attack baselines

Abstract

Emerging Large Reasoning Models (LRMs) consistently excel in mathematical and reasoning tasks, showcasing remarkable capabilities. However, the enhancement of reasoning abilities and the exposure of internal reasoning processes introduce new safety vulnerabilities. A critical question arises: when reasoning becomes intertwined with harmfulness, will LRMs become more vulnerable to jailbreaks in reasoning mode? To investigate this, we introduce HauntAttack, a novel and general-purpose black-box adversarial attack framework that systematically embeds harmful instructions into reasoning questions. Specifically, we modify key reasoning conditions in existing questions with harmful instructions, thereby constructing a reasoning pathway that guides the model step by step toward unsafe outputs. We evaluate HauntAttack on 11 LRMs and observe an average attack success rate of 70\%, achieving up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks