BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
Biao Yi, Zekun Fei, Jianing Geng, Tong Li, Lihai Nie, Zheli Liu, and Yiming Li

TL;DR
This paper introduces 'overthinking backdoors' in large reasoning models, allowing attackers to control reasoning verbosity through data poisoning, increasing resource consumption without affecting answer correctness.
Contribution
It presents a novel tunable backdoor attack on LRMs that controls reasoning verbosity, expanding the scope of model manipulation techniques.
Findings
Backdoor reliably increases reasoning length
Stealthy attack preserves answer correctness
Effective across various large reasoning models
Abstract
Large reasoning models (LRMs) have emerged as a significant advancement in artificial intelligence, representing a specialized class of large language models (LLMs) designed to tackle complex reasoning tasks. The defining characteristic of LRMs lies in their extensive chain-of-thought (CoT) reasoning capabilities. In this paper, we identify a previously unexplored attack vector against LRMs, which we term "overthinking backdoors". We advance this concept by proposing a novel tunable backdoor, which moves beyond simple on/off attacks to one where an attacker can precisely control the extent of the model's reasoning verbosity. Our attack is implemented through a novel data poisoning methodology. It pairs a tunable trigger-where the number of repetitions signals the desired intensity-with a correspondingly verbose CoT response. These responses are programmatically generated by instructing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization
