Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models
Harsh Chaudhari, Ethan Rathbun, Hanna Foerster, Jamie Hayes, Matthew Jagielski, Milad Nasr, Ilia Shumailov, Alina Oprea

TL;DR
This paper introduces 'Thought-Transfer,' a novel indirect poisoning attack on chain-of-thought reasoning models that manipulates model outputs by transferring reasoning traces from different tasks, achieving high success rates without explicit target data.
Contribution
The work presents a new indirect poisoning method that manipulates reasoning traces to influence model outputs across tasks, revealing a previously unknown threat vector.
Findings
Thought-Transfer achieves 70% success in targeted attacks.
Poisoned training data improves performance by 10-15%.
The attack is effective across multiple benchmarks.
Abstract
Chain-of-Thought (CoT) reasoning has emerged as a powerful technique for enhancing large language models' capabilities by generating intermediate reasoning steps for complex tasks. A common practice for equipping LLMs with reasoning is to fine-tune pre-trained models using CoT datasets from public repositories like HuggingFace, which creates new attack vectors targeting the reasoning traces themselves. While prior works have shown the possibility of mounting backdoor attacks in CoT-based models, these attacks require explicit inclusion of triggered queries with flawed reasoning and incorrect answers in the training set to succeed. Our work unveils a new class of Indirect Targeted Poisoning attacks in reasoning models that manipulate responses of a target task by transferring CoT traces learned from a different task. Our "Thought-Transfer" attack can influence the LLM output on a target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
