An Empirical Study of Token-based Micro Commits
Masanari Kondo, Daniel M. German, Yasutaka Kamei, Naoyasu Ubayashi,, Osamu Mizuno

TL;DR
This paper introduces a token-based definition of micro commits to better characterize small code changes, enabling more precise analysis of their nature and role in bug fixing across open-source projects.
Contribution
It proposes a novel token-level approach to define and analyze micro commits, improving upon line-based methods and providing insights into their characteristics and impact.
Findings
Micro commits mainly involve replacing a single name or literal token.
Micro commits are more frequently used for bug fixes.
Token-based analysis distinguishes different types of small changes effectively.
Abstract
In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for creating small changes can help understand the types of errors introduced. Eventually, these reasons and the types of errors can be used to enhance quality assurance approaches for improving code quality. While prior studies used code churns to characterize and investigate the small changes, such a definition has a critical limitation. Specifically, it loses the information of changed tokens in a line. For example, this definition fails to distinguish the following two one-line changes: (1)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies
