Understanding Dominant Themes in Reviewing Agentic AI-authored Code

Md. Asif Haider; Thomas Zimmermann

arXiv:2601.19287·cs.SE·January 28, 2026

Understanding Dominant Themes in Reviewing Agentic AI-authored Code

Md. Asif Haider, Thomas Zimmermann

PDF

Open Access 2 Datasets

TL;DR

This study analyzes how reviewers respond to AI-generated code in open-source projects, identifying key review themes and evaluating LLMs for comment annotation, revealing that reviews focus on documentation, refactoring, styling, testing, and security.

Contribution

It introduces a taxonomy of review comment themes for AI-authored code and demonstrates that LLMs can reliably annotate review comments at scale, aligning well with human judgments.

Findings

01

Review comments mainly focus on documentation, refactoring, styling, testing, and security.

02

LLMs achieve high accuracy in annotating review comments, with 78.63% exact match.

03

AI-generated code reviews still require human oversight for certain aspects.

Abstract

While prior work has examined the generation capabilities of Agentic AI systems, little is known about how reviewers respond to AI-authored code in practice. In this paper, we present a large-scale empirical study of code review dynamics in agent-generated PRs. Using a curated subset of the AIDev dataset, we analyze 19,450 inline review comments spanning 3,177 agent-authored PRs from real-world GitHub repositories. We first derive a taxonomy of 12 review comment themes using topic modeling combined with large language model (LLM)-assisted semantic clustering and consolidation. According to this taxonomy, we then investigate whether zero-shot prompts to LLM can reliably annotate review comments. Our evaluation against human annotations shows that open-source LLM achieves reasonably high exact match (78.63%), macro F1 score (0.78), and substantial agreement with human annotators at the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Hate Speech and Cyberbullying Detection · Artificial Intelligence in Healthcare and Education