Human-Agent versus Human Pull Requests: A Testing-Focused Characterization and Comparison

Roberto Milanese; Francesco Salzano; Angelica Spina; Antonio Vitale; Remo Pareschi; Fausto Fasano; Mattia Fazzini

arXiv:2601.21194·cs.SE·January 30, 2026

Human-Agent versus Human Pull Requests: A Testing-Focused Characterization and Comparison

Roberto Milanese, Francesco Salzano, Angelica Spina, Antonio Vitale, Remo Pareschi, Fausto Fasano, Mattia Fazzini

PDF

Open Access

TL;DR

This study empirically compares human-agent and human-only pull requests, revealing that human-agent PRs tend to include more extensive testing and are more likely to add new tests, with similar overall testing quality.

Contribution

It provides the first detailed characterization of testing practices in human-agent collaboration within software development workflows.

Findings

01

HAPRs include tests at a similar rate as HPRs.

02

HAPRs have nearly double the test-to-source line ratio.

03

HAPRs are more likely to add new tests during co-evolution.

Abstract

AI-based coding agents are increasingly integrated into software development workflows, collaborating with developers to create pull requests (PRs). Despite their growing adoption, the role of human-agent collaboration in software testing remains poorly understood. This paper presents an empirical study of 6,582 human-agent PRs (HAPRs) and 3,122 human PRs (HPRs) from the AIDev dataset. We compare HAPRs and HPRs along three dimensions: (i) testing frequency and extent, (ii) types of testing-related changes (code-and-test co-evolution vs. test-focused), and (iii) testing quality, measured by test smells. Our findings reveal that, although the likelihood of including tests is comparable (42.9% for HAPRs vs. 40.0% for HPRs), HAPRs exhibit a larger extent of testing, nearly doubling the test-to-source line ratio found in HPRs. While test-focused task distributions are comparable, HAPRs are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Techniques and Practices · Software Testing and Debugging Techniques · Mobile Crowdsensing and Crowdsourcing