Evaluating the Impact of Data Cleaning on the Quality of Generated Pull   Request Descriptions

Kutay Tire; Berk \c{C}akar; Eray T\"uz\"un

arXiv:2505.01120·cs.SE·May 5, 2025

Evaluating the Impact of Data Cleaning on the Quality of Generated Pull Request Descriptions

Kutay Tire, Berk \c{C}akar, Eray T\"uz\"un

PDF

Open Access

TL;DR

This paper investigates how data cleaning improves the quality of AI-generated pull request descriptions by filtering noise from large datasets, leading to significant performance gains in multiple models.

Contribution

It introduces four heuristics for cleaning PR datasets and demonstrates their effectiveness in enhancing description generation models' performance.

Findings

01

Cleaning datasets improves ROUGE scores by around 8.6%.

02

Models trained on cleaned data produce more relevant and readable descriptions.

03

Dataset refinement significantly benefits AI tools for PR description generation.

Abstract

Pull Requests (PRs) are central to collaborative coding, summarizing code changes for reviewers. However, many PR descriptions are incomplete, uninformative, or have out-of-context content, compromising developer workflows and hindering AI-based generation models trained on commit messages and original descriptions as "ground truth." This study examines the prevalence of "noisy" PRs and evaluates their impact on state-of-the-art description generation models. To do so, we propose four cleaning heuristics to filter noise from an initial dataset of 169K+ PRs drawn from 513 GitHub repositories. We train four models-BART, T5, PRSummarizer, and iTAPE-on both raw and cleaned datasets. Performance is measured via ROUGE-1, ROUGE-2, and ROUGE-L metrics, alongside a manual evaluation to assess description quality improvements from a human perspective. Cleaning the dataset yields significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Software Reliability and Analysis Research · Software Testing and Debugging Techniques