Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Ashwath Vaithinathan Aravindan; Mayank Kejriwal

arXiv:2603.03332·cs.CL·April 20, 2026

Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Ashwath Vaithinathan Aravindan, Mayank Kejriwal

PDF

1 Repo

TL;DR

This paper empirically evaluates how large language models handle various structured perturbations in chain-of-thought reasoning, revealing different vulnerability patterns and the impact of model scaling.

Contribution

It provides a comprehensive analysis of LLM robustness to five types of reasoning perturbations across multiple model sizes, highlighting scaling effects and robustness challenges.

Findings

01

MathError perturbations cause significant accuracy loss in small models but improve with scale.

02

UnitConversion remains difficult across all model sizes.

03

ExtraSteps perturbations minimally affect accuracy even in small models.

Abstract

Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents a comprehensive empirical evaluation of LLM robustness to a structured taxonomy of 5 CoT perturbation types: \textit{MathError, UnitConversion, Sycophancy, SkippedSteps,} and \textit{ExtraSteps}. We evaluate 13 models spanning three orders of magnitude in parameter count, testing their ability to complete mathematical reasoning tasks despite perturbations injected in the reasoning chain. Our key findings reveal heterogeneous vulnerability patterns: MathError perturbations produce the most severe degradation in small models (50-60\% accuracy loss) but show strong scaling benefits; UnitConversion remains challenging across all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mystic-Slice/CoTPerturbation
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.