Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis

Naizhu Jin; Zhong Li; Guang Yang; Tian Zhang; Qingkai Zeng

arXiv:2512.09679·cs.SE·December 11, 2025

Understanding Chain-of-Thought Effectiveness in Code Generation: An Empirical and Information-Theoretic Analysis

Naizhu Jin, Zhong Li, Guang Yang, Tian Zhang, Qingkai Zeng

PDF

Open Access

TL;DR

This paper systematically investigates how Chain-of-Thought prompting improves code generation in large language models, revealing that structured, high-quality CoT significantly enhances accuracy and efficiency depending on model size and language type.

Contribution

It provides an empirical and information-theoretic analysis of CoT prompting, comparing various paradigms across multiple benchmarks and models, and offers practical guidance for effective CoT use.

Findings

01

Externally guided CoT outperforms direct generation.

02

Structured CoT improves Pass@1 by 5-12% with fewer tokens.

03

High-quality reasoning from strong models yields higher accuracy.

Abstract

Large language models (LLMs) achieve strong performance on code generation, but the mechanisms by which Chain-of-Thought (CoT) prompting helps remain unclear. We present a systematic empirical and information-theoretic study of CoT effectiveness in neural code generation, evaluating five paradigms (Zero-Shot, Zero-Shot CoT, Self-Planning, Structured CoT, Reasoning-CoT) across six Python benchmarks, a multilingual benchmark with 12 programming languages, and six models from 7B to 480B parameters, using conditional mutual information $I (Y; C ∣ X)$ as a conceptual lens. Our results show that externally guided CoT consistently outperforms direct generation, with structured methods improving Pass@1 by 5--12\% on average while using substantially fewer tokens than reflective reasoning, and that CoT benefits depend on language type systems and model capacity. We further find that reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Artificial Intelligence in Healthcare and Education