Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories

Maximilian Schreiber; Pascal Tippe

arXiv:2510.26103·cs.CR·October 31, 2025

Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories

Maximilian Schreiber, Pascal Tippe

PDF

TL;DR

This large-scale empirical study analyzes security vulnerabilities in AI-generated code from major tools on GitHub, revealing patterns in language-specific vulnerabilities, tool performance differences, and the use of AI for documentation.

Contribution

It provides the first extensive analysis of security vulnerabilities in AI-generated code across multiple tools and languages, with insights into vulnerability patterns and tool effectiveness.

Findings

01

87.9% of AI-generated code lacks CWE-mapped vulnerabilities

02

Python shows higher vulnerability rates than JavaScript and TypeScript

03

GitHub Copilot outperforms others in security density for Python and TypeScript

Abstract

This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories. We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT (91.52\%), GitHub Copilot (7.50\%), Amazon CodeWhisperer (0.52\%), and Tabnine (0.46\%). Using CodeQL static analysis, we identified 4,241 Common Weakness Enumeration (CWE) instances across 77 distinct vulnerability types. Our findings reveal that while 87.9\% of AI-generated code does not contain identifiable CWE-mapped vulnerabilities, significant patterns emerge regarding language-specific vulnerabilities and tool performance. Python consistently exhibited higher vulnerability rates (16.18\%-18.50\%) compared to JavaScript (8.66\%-8.99\%) and TypeScript (2.50\%-7.14\%) across all tools. We observed notable differences in security performance, with GitHub…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.