Security Vulnerabilities in AI-Generated Code: A Large-Scale Analysis of Public GitHub Repositories
Maximilian Schreiber, Pascal Tippe

TL;DR
This large-scale empirical study analyzes security vulnerabilities in AI-generated code from major tools on GitHub, revealing patterns in language-specific vulnerabilities, tool performance differences, and the use of AI for documentation.
Contribution
It provides the first extensive analysis of security vulnerabilities in AI-generated code across multiple tools and languages, with insights into vulnerability patterns and tool effectiveness.
Findings
87.9% of AI-generated code lacks CWE-mapped vulnerabilities
Python shows higher vulnerability rates than JavaScript and TypeScript
GitHub Copilot outperforms others in security density for Python and TypeScript
Abstract
This paper presents a comprehensive empirical analysis of security vulnerabilities in AI-generated code across public GitHub repositories. We collected and analyzed 7,703 files explicitly attributed to four major AI tools: ChatGPT (91.52\%), GitHub Copilot (7.50\%), Amazon CodeWhisperer (0.52\%), and Tabnine (0.46\%). Using CodeQL static analysis, we identified 4,241 Common Weakness Enumeration (CWE) instances across 77 distinct vulnerability types. Our findings reveal that while 87.9\% of AI-generated code does not contain identifiable CWE-mapped vulnerabilities, significant patterns emerge regarding language-specific vulnerabilities and tool performance. Python consistently exhibited higher vulnerability rates (16.18\%-18.50\%) compared to JavaScript (8.66\%-8.99\%) and TypeScript (2.50\%-7.14\%) across all tools. We observed notable differences in security performance, with GitHub…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
