Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation

Jaechul Roh; Varun Gandhi; Shivani Anilkumar; Arin Garg

arXiv:2506.06971·cs.CL·June 13, 2025

Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation

Jaechul Roh, Varun Gandhi, Shivani Anilkumar, Arin Garg

PDF

Open Access 1 Repo

TL;DR

This paper investigates the robustness of reasoning in large language models during code generation by applying adversarial prompt perturbations, revealing significant fragility and variability in model performance.

Contribution

It introduces Chain-of-Code Collapse, a systematic evaluation framework with adversarial prompt perturbations to test reasoning robustness in LLMs for code tasks.

Findings

01

Performance drops up to 42.1% with certain perturbations

02

Model accuracy improves by up to 35.3% with some modifications

03

Reveals fragility and unpredictability in LLM reasoning systems

Abstract

Large Language Models (LLMs) have achieved remarkable success in tasks requiring complex reasoning, such as code generation, mathematical problem solving, and algorithmic synthesis -- especially when aided by reasoning tokens and Chain-of-Thought prompting. Yet, a core question remains: do these models truly reason, or do they merely exploit shallow statistical patterns? In this paper, we introduce Chain-of-Code Collapse, where we systematically investigate the robustness of reasoning LLMs by introducing a suite of semantically faithful yet adversarially structured prompt perturbations. Our evaluation -- spanning 700 perturbed code generations derived from LeetCode-style problems -- applies transformations such as storytelling reframing, irrelevant constraint injection, example reordering, and numeric perturbation. We observe that while certain modifications severely degrade performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jrohsc/Chain-of-Code-Collapse
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques