AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software

Bin Wang; Wenjie Yu; Yilu Zhong; Hao Yu; Keke Lian; Chaohua Lu; Hongfang Zheng; Dong Zhang; and Hui Li

arXiv:2512.18567·cs.SE·December 23, 2025

AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software

Bin Wang, Wenjie Yu, Yilu Zhong, Hao Yu, Keke Lian, Chaohua Lu, Hongfang Zheng, Dong Zhang, and Hui Li

PDF

Open Access

TL;DR

This large-scale empirical study investigates the prevalence, security implications, and ecosystem impact of AI-generated code in modern software, revealing structured adoption patterns and security risks associated with AI in code development.

Contribution

The paper introduces a high-precision detection pipeline and a comprehensive benchmark to distinguish AI-generated code from human-written code in real-world repositories and vulnerability data.

Findings

01

AI-generated code constitutes a significant portion of new code, mainly in boilerplate and non-critical areas.

02

Certain security weakness types are overrepresented in AI-generated code, indicating potential security risks.

03

AI accelerates code changes but can introduce persistent vulnerabilities when human review is shallow.

Abstract

Large language models (LLMs) for code generation are becoming integral to modern software development, but their real-world prevalence and security impact remain poorly understood. We present the first large-scale empirical study of AI-generated code (AIGCode) in the wild. We build a high-precision detection pipeline and a representative benchmark to distinguish AIGCode from human-written code, and apply them to (i) development commits from the top 1,000 GitHub repositories (2022-2025) and (ii) 7,000+ recent CVE-linked code changes. This lets us label commits, files, and functions along a human/AI axis and trace how AIGCode moves through projects and vulnerability life cycles. Our measurements show three ecological patterns. First, AIGCode is already a substantial fraction of new code, but adoption is structured: AI concentrates in glue code, tests, refactoring, documentation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Ethics and Social Impacts of AI