AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents

Bhanu Prakash Vangala; Ali Adibifar; Ashish Gehani; Tanu Malik

arXiv:2512.22387·cs.SE·March 25, 2026

AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents

Bhanu Prakash Vangala, Ali Adibifar, Ashish Gehani, Tanu Malik

PDF

Open Access

TL;DR

This empirical study examines the reproducibility of LLM-generated code, revealing that only about two-thirds of projects run successfully in clean environments and highlighting significant hidden dependencies.

Contribution

The paper introduces a dependency framework and provides the first large-scale empirical analysis of reproducibility issues in LLM-based coding agents.

Findings

01

68.3% of projects execute successfully out-of-the-box

02

Substantial variation in reproducibility across programming languages

03

Average expansion of dependencies by 13.5 times from declared to runtime

Abstract

The rise of Large Language Models (LLMs) as coding agents promises to accelerate software development, but their impact on generated code reproducibility remains largely unexplored. This paper presents an empirical study investigating whether LLM-generated code can be executed successfully in a clean environment with only OS packages and using only the dependencies that the model specifies. We evaluate three state-of-the-art LLM coding agents (Claude Code, OpenAI Codex, and Gemini) across 300 projects generated from 100 standardized prompts in Python, JavaScript, and Java. We introduce a three-layer dependency framework (distinguishing between claimed, working, and runtime dependencies) to quantify execution reproducibility. Our results show that only 68.3% of projects execute out-of-the-box, with substantial variation across languages (Python 89.2%, Java 44.0%). We also find a 13.5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software System Performance and Reliability