Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

Lukas Twist; Shu Yang; Hanqi Yan; Jingzhi Gong; Di Wang; Helen Yannakoudakis; Jie M. Zhang

arXiv:2601.21894·cs.LG·January 30, 2026

Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

Lukas Twist, Shu Yang, Hanqi Yan, Jingzhi Gong, Di Wang, Helen Yannakoudakis, Jie M. Zhang

PDF

Open Access 1 Datasets

TL;DR

This study investigates how the structural complexity of code used in fine-tuning influences the reasoning abilities of large language models, highlighting that targeted complexity levels can enhance performance more effectively than diverse data.

Contribution

The paper introduces a data-centric approach analyzing code complexity's impact on LLM reasoning, emphasizing the importance of structural properties over sheer data diversity.

Findings

01

Restricting fine-tuning data to specific structural complexity improves reasoning performance.

02

Structural properties of code significantly influence the usefulness of code in training LLMs.

03

Data with controlled complexity outperforms structurally diverse code in 83% of experiments.

Abstract

Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning. Recent work suggests that exposure to code can further enhance these skills, but existing studies largely treat code as a generic training signal, leaving open the question of which properties of code actually contribute to improved reasoning. To address this gap, we study the structural complexity of code, which captures control flow and compositional structure that may shape how models internalise multi-step reasoning during fine-tuning. We examine two complementary settings: solution-driven complexity, where complexity varies across multiple solutions to the same problem, and problem-driven complexity, where complexity reflects variation in the underlying tasks. Using cyclomatic complexity and logical lines of code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

itsluketwist/NotAllCodeIsEqual
dataset· 67 dl
67 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Artificial Intelligence in Healthcare and Education