Identifying change patterns in software history
Jason Dagit, Matthew Sottile

TL;DR
This paper introduces a method for analyzing software evolution by grouping and generalizing structural code changes at the syntax tree level, enabling better understanding of software development patterns over time.
Contribution
It proposes a novel approach combining structural differencing, similarity grouping, and antiunification to identify change patterns in software history.
Findings
Structural differencing reveals meaningful code change patterns.
Tree similarity metrics effectively group related changes.
Antiunification generalizes concrete changes into recognizable patterns.
Abstract
Traditional algorithms for detecting differences in source code focus on differences between lines. As such, little can be learned about abstract changes that occur over time within a project. Structural differencing on the program's abstract syntax tree reveals changes at the syntactic level within code, which allows us to further process the differences to understand their meaning. We propose that grouping of changes by some metric of similarity, followed by pattern extraction via antiunification will allow us to identify patterns of change within a software project from the sequence of changes contained within a Version Control System (VCS). Tree similarity metrics such as a tree edit distance can be used to group changes in order to identify groupings that may represent a single class of change (e.g., adding a parameter to a function call). By applying antiunification within each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Web Data Mining and Analysis · Advanced Software Engineering Methodologies
