Consistency of Large Reasoning Models Under Multi-Turn Attacks

Yubo Li; Ramayya Krishnan; Rema Padman

arXiv:2602.13093·cs.AI·March 13, 2026

Consistency of Large Reasoning Models Under Multi-Turn Attacks

Yubo Li, Ramayya Krishnan, Rema Padman

PDF

Open Access

TL;DR

This paper evaluates the robustness of large reasoning models against multi-turn adversarial attacks, revealing that reasoning improves but does not guarantee resilience, and identifying specific failure modes and challenges for confidence-based defenses.

Contribution

It systematically assesses reasoning models under adversarial pressure, uncovers distinct vulnerability profiles, and demonstrates the limitations of existing confidence-based defense methods.

Findings

01

Reasoning models outperform baselines but remain vulnerable to specific attacks.

02

Five failure modes identified, with Self-Doubt and Social Conformity accounting for half of failures.

03

Confidence-Aware Response Generation fails for reasoning models due to overconfidence issues.

Abstract

Large reasoning models with reasoning capabilities achieve state-of-the-art performance on complex tasks, but their robustness under multi-turn adversarial pressure remains underexplored. We evaluate nine frontier reasoning models under adversarial attacks. Our findings reveal that reasoning confers meaningful but incomplete robustness: most reasoning models studied significantly outperform instruction-tuned baselines, yet all exhibit distinct vulnerability profiles, with misleading suggestions universally effective and social pressure showing model-specific efficacy. Through trajectory analysis, we identify five failure modes (Self-Doubt, Social Conformity, Suggestion Hijacking, Emotional Susceptibility, and Reasoning Fatigue) with the first two accounting for 50% of failures. We further demonstrate that Confidence-Aware Response Generation (CARG), effective for standard LLMs, fails…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI