AIDev: Studying AI Coding Agents on GitHub
Hao Li, Haoxiang Zhang, Ahmed E. Hassan

TL;DR
AIDev is a large-scale dataset of AI-generated pull requests on GitHub, enabling research into AI adoption, developer productivity, and human-AI collaboration in software engineering.
Contribution
This paper introduces AIDev, the first comprehensive dataset capturing real-world AI coding agent interactions on GitHub, covering nearly one million pull requests across thousands of repositories.
Findings
Provides extensive data on AI coding agent usage in real projects
Facilitates research on AI's impact on software development
Supports analysis of human-AI collaboration patterns
Abstract
AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Artificial Intelligence in Games
