Evaluating the Impact of Data Cleaning on the Quality of Generated Pull Request Descriptions
Kutay Tire, Berk \c{C}akar, Eray T\"uz\"un

TL;DR
This paper investigates how data cleaning improves the quality of AI-generated pull request descriptions by filtering noise from large datasets, leading to significant performance gains in multiple models.
Contribution
It introduces four heuristics for cleaning PR datasets and demonstrates their effectiveness in enhancing description generation models' performance.
Findings
Cleaning datasets improves ROUGE scores by around 8.6%.
Models trained on cleaned data produce more relevant and readable descriptions.
Dataset refinement significantly benefits AI tools for PR description generation.
Abstract
Pull Requests (PRs) are central to collaborative coding, summarizing code changes for reviewers. However, many PR descriptions are incomplete, uninformative, or have out-of-context content, compromising developer workflows and hindering AI-based generation models trained on commit messages and original descriptions as "ground truth." This study examines the prevalence of "noisy" PRs and evaluates their impact on state-of-the-art description generation models. To do so, we propose four cleaning heuristics to filter noise from an initial dataset of 169K+ PRs drawn from 513 GitHub repositories. We train four models-BART, T5, PRSummarizer, and iTAPE-on both raw and cleaned datasets. Performance is measured via ROUGE-1, ROUGE-2, and ROUGE-L metrics, alongside a manual evaluation to assess description quality improvements from a human perspective. Cleaning the dataset yields significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Reliability and Analysis Research · Software Testing and Debugging Techniques
