Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions
Kexin Sun, Hongyu Kuang, Sebastian Baltes, Xin Zhou, He Zhang, Xiaoxing Ma, Guoping Rong, Dong Shao, and Christoph Treude

TL;DR
This study empirically evaluates the adoption, configuration, and effectiveness of AI-based code review tools on GitHub, revealing factors that influence whether comments lead to code changes.
Contribution
It introduces a two-stage LLM-assisted framework to assess comment responsiveness and identifies key factors affecting AI review tool effectiveness.
Findings
Adoption of AI review tools is increasing but effectiveness varies.
Concise comments with code snippets are more likely to prompt changes.
Hunk-level review tools have higher impact on code modifications.
Abstract
AI-based code review tools automatically review and comment on pull requests to improve code quality. Despite their growing presence, little is known about their actual impact. We present a large-scale empirical study of 16 popular AI-based code review actions for GitHub workflows, analyzing more than 22,000 review comments in 178 repositories. We investigate (1) how these tools are adopted and configured, (2) whether their comments lead to code changes, and (3) which factors influence their effectiveness. We develop a two-stage LLM-assisted framework to determine whether review comments are addressed, and use interpretable machine learning to identify influencing factors. Our findings show that, while adoption is growing, effectiveness varies widely. Comments that are concise, contain code snippets, and are manually triggered, particularly those from hunk-level review tools, are more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
