Excessive Reasoning Attack on Reasoning LLMs

Wai Man Si; Mingjie Li; Michael Backes; Yang Zhang

arXiv:2506.14374·cs.CR·June 18, 2025

Excessive Reasoning Attack on Reasoning LLMs

Wai Man Si, Mingjie Li, Michael Backes, Yang Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper reveals how adversarial inputs can exploit excessive reasoning in large language models to significantly increase computational costs, proposing a new loss framework to generate such inputs and demonstrating their effectiveness across multiple models and datasets.

Contribution

The paper introduces a novel loss framework to craft adversarial inputs that exploit excessive reasoning behaviors in LLMs, increasing computational overhead without losing utility.

Findings

01

Adversarial inputs can triple to ninefold increase reasoning length.

02

The attack transfers across different models and architectures.

03

The proposed loss effectively induces excessive reasoning behaviors.

Abstract

Recent reasoning large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, exhibit strong performance on complex tasks through test-time inference scaling. However, prior studies have shown that these models often incur significant computational costs due to excessive reasoning, such as frequent switching between reasoning trajectories (e.g., underthinking) or redundant reasoning on simple questions (e.g., overthinking). In this work, we expose a novel threat: adversarial inputs can be crafted to exploit excessive reasoning behaviors and substantially increase computational overhead without compromising model utility. Therefore, we propose a novel loss framework consisting of three components: (1) Priority Cross-Entropy Loss, a modification of the standard cross-entropy objective that emphasizes key tokens by leveraging the autoregressive nature of LMs; (2) Excessive Reasoning…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1) The decomposition into PCE, ER, DT is well-motivated by the goal (longer reasoning). Ablations isolate each piece (Table 5; Table 11), showing monotone increases when objectives are combined. 2) On LLaMA-8B-R1-distill (GSM8K greedy), reasoning tokens rise from 668 to 1914 and latency 24.3s to 54.9s with only 10-token suffixes (Table 1). Similar or larger effects on Qwen-7B-R1-distill. The designed attack works experimentally. 3)Cross-model tests (Table 4) show the attack is not brittle. Thi

Weaknesses

1) Only 50 examples * 2 datasets * 3 runs = (approx.) 300 evaluations. No statistical test on accuracy differences. For GSM8K (app. 8.5 k train / 1 k test), using 50 samples risks > +- 3 points variance; hence “no degradation” claims are statistically inconclusive. 2) The threat model assumes full gradient access (white-box), while commercial systems (o1/o3-mini) are black-box API only. Transferability results (Tab. 4) are modest (<600 tokens gain) and could arise from stochastic sampling nois

Reviewer 02Rating 6Confidence 2

Strengths

1. **Novel and Practical Attack Objective:** The paper identifies and exploits a highly relevant vulnerability: the efficiency and resource consumption of reasoning LLMs. Unlike attacks targeting answer correctness, this focuses on *economic* and *operational* damage (computational overhead), which is a critical, underexplored threat in commercial LLM deployment (akin to a DoS attack). 2. **Strong Empirical Validation and Transferability:** The attack demonstrates high efficacy, successfully i

Weaknesses

1. **Ambiguity in Causality of Performance Gain:** The paper observes that the attack, while lengthening reasoning, sometimes increases task accuracy. The analysis attributes this to increased capacity allocation, but a more in-depth exploration of *why* the attack's specific, lexically biased reasoning leads to *better* answers is needed to fully understand the mechanism. 2. **Tokenizer Dependency in Transferability:** The transferability analysis, particularly the difference between the LLaM

Reviewer 03Rating 4Confidence 3

Strengths

1. This paper introduces Excessive Reasoning Attack, a novel adversarial attack that differs from prior works focusing solely on content manipulation or refusal-based safety issues. It specifically targets the reasoning process of LLMs, exposing a new dimension of vulnerability related to inference efficiency. 2. The paper proposes three complementary differentiable proxy losses—Priority Cross-Entropy (PCE), Excessive Reasoning (ER), and Delayed Termination (DT)—which effectively address the no

Weaknesses

1. White-box Assumption and Limited Practicality This paper introduces Excessive Reasoning Attack, a novel adversarial attack targeting reasoning LLMs. I acknowledge that such an attack poses a more substantial threat to online LLM services (e.g., OpenAI, Google, and Alibaba Cloud) than to open-source models. However, the proposed method relies on a white-box assumption, requiring full access to model weights and gradients. This dependency makes it inapplicable to black-box commercial models. A

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlockchain Technology Applications and Security