Using Source Code Density to Improve the Accuracy of Automatic Commit   Classification into Maintenance Activities

Sebastian H\"onel; Morgan Ericsson; Welf L\"owe; Anna Wingkvist

arXiv:2005.13904·cs.SE·May 3, 2021

Using Source Code Density to Improve the Accuracy of Automatic Commit Classification into Maintenance Activities

Sebastian H\"onel, Morgan Ericsson, Welf L\"owe, Anna Wingkvist

PDF

1 Repo

TL;DR

This paper introduces source code density as a new feature to enhance the accuracy of automatic commit classification, demonstrating significant improvements across multiple projects and scenarios.

Contribution

It proposes source code density as a novel feature and explores how previous commits' code density influences classification accuracy.

Findings

01

Achieved up to 89% accuracy in cross-project classification.

02

Improved classification accuracy by incorporating code density.

03

Models trained on single projects reached up to 93% accuracy.

Abstract

Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason. We introduce source code density, a measure of the net size of a commit, and show how it improves the accuracy of automatic commit classification compared to previous size-based classifications. We also investigate how preceding generations of commits affect the class of a commit, and whether taking the code density of previous commits into account can improve the accuracy further. We achieve up to 89% accuracy and a Kappa of 0.82 for the cross-project commit classification where the model is trained on one project and applied to other projects. Models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MrShoenel/git-density
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.