BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Shuaitong Liu; Renjue Li; Lijia Yu; Lijun Zhang; Zhiming Liu; Gaojie Jin

arXiv:2511.10714·cs.CR·November 17, 2025

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Shuaitong Liu, Renjue Li, Lijia Yu, Lijun Zhang, Zhiming Liu, Gaojie Jin

PDF

Open Access 1 Video

TL;DR

This paper introduces BadThink, a stealthy backdoor attack on large language models with chain-of-thought prompting, which induces excessive reasoning steps to increase computational costs without affecting final answers.

Contribution

We present the first backdoor attack targeting CoT reasoning in LLMs, using a novel poisoning strategy to induce overthinking behavior covertly.

Findings

01

BadThink increases reasoning trace length by over 17x on MATH-500.

02

The attack remains stealthy and robust across multiple models.

03

It significantly degrades computational efficiency without changing final outputs.

Abstract

Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of large language models (LLMs), but have also introduced their computational efficiency as a new attack surface. In this paper, we propose BadThink, the first backdoor attack designed to deliberately induce "overthinking" behavior in CoT-enabled LLMs while ensuring stealth. When activated by carefully crafted trigger prompts, BadThink manipulates the model to generate inflated reasoning traces - producing unnecessarily redundant thought processes while preserving the consistency of final outputs. This subtle attack vector creates a covert form of performance degradation that significantly increases computational costs and inference time while remaining difficult to detect through conventional output evaluation methods. We implement this attack through a sophisticated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Topic Modeling