RefDiff: Detecting Refactorings in Version Histories
Danilo Silva, Marco Tulio Valente

TL;DR
RefDiff is an automated tool that accurately detects various refactoring operations between code revisions in git repositories, aiding understanding of software evolution.
Contribution
RefDiff introduces a novel combination of heuristics based on static analysis and code similarity to identify 13 refactoring types with high precision and recall.
Findings
Achieved 100% precision and 88% recall on an oracle of 448 refactorings.
Outperformed existing state-of-the-art refactoring detection tools.
Effective across seven Java projects.
Abstract
Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to understand software evolution, adapt software components, merge code changes, and other applications. In this paper, we present RefDiff, an automated approach that identifies refactorings performed between two code revisions in a git repository. RefDiff employs a combination of heuristics based on static analysis and code similarity to detect 13 well-known refactoring types. In an evaluation using an oracle of 448 known refactoring operations, distributed across seven Java projects, our approach achieved precision of 100% and recall of 88%. Moreover, our evaluation suggests that RefDiff has superior precision and recall than existing state-of-the-art…
| Relationship | Condition | |
|---|---|---|
| , such that: | ||
| Same Type | ||
| Rename Type | ||
| Move Type | ||
| Move and Rename Type | ||
| Extract Supertype | ||
| , such that: | ||
| Same Method | ||
| Rename Method | ||
| Change Method Signature | ||
| Pull Up Method | ||
| Push Down Method | ||
| Move Method | ||
| Extract Method | ||
| Inline Method | ||
| , such that: | ||
| Same Field | ||
| Pull Up Field | ||
| Push Down Field | ||
| Move Field | ||
| Repository URL | Commit |
|---|---|
| github.com/linkedin/rest.li | 54fa890 |
| github.com/droolsjbpm/jbpm | 3815f29 |
| github.com/gradle/gradle | 44aab62 |
| github.com/jenkinsci/workflow-plugin | d0e374c |
| github.com/spring-projects/spring-roo | 0bb4cca |
| github.com/BuildCraft/BuildCraft | a5cdd8c |
| github.com/droolsjbpm/drools | 1bf2875 |
| github.com/jersey/jersey | d94ca2b |
| github.com/undertow-io/undertow | d5b2bb8 |
| github.com/kuujo/copycat | 19a49f8 |
| Ref. Type | # | TP | FP | FN | Precision | Recall | |
|---|---|---|---|---|---|---|---|
| Rename Type | 2 | 0.4 | 2 | 0 | 0 | 1.000 | 1.000 |
| Move Type | 2 | 0.9 | 2 | 0 | 0 | 1.000 | 1.000 |
| Extract Superclass | 2 | 0.8 | 2 | 0 | 0 | 1.000 | 1.000 |
| Rename Method | 24 | 0.3 | 22 | 3 | 2 | 0.880 | 0.917 |
| Pull Up Method | 7 | 0.4 | 7 | 0 | 0 | 1.000 | 1.000 |
| Push Down Method | 2 | 0.6 | 2 | 0 | 0 | 1.000 | 1.000 |
| Move Method | 24 | 0.4 | 21 | 1 | 3 | 0.955 | 0.875 |
| Extract Method | 25 | 0.1 | 25 | 9 | 0 | 0.735 | 1.000 |
| Inline Method | 6 | 0.3 | 5 | 2 | 1 | 0.714 | 0.833 |
| Pull Up Field | 2 | 0.5 | 2 | 0 | 0 | 1.000 | 1.000 |
| Push Down Field | 5 | 0.3 | 5 | 0 | 0 | 1.000 | 1.000 |
| Move Field | 1 | 0.5 | 1 | 1 | 0 | 0.500 | 1.000 |
| Total | 102 | 96 | 16 | 6 | 0.857 | 0.941 |
| Repository URL | Description | LOC |
|---|---|---|
| github.com/Atmosphere/atmosphere | The Asynchronous WebSocket/Comet Framework | 65,841 |
| github.com/clojure/clojure | The Clojure programming language | 58,417 |
| github.com/google/guava | Google Core Libraries for Java 6+ | 374,068 |
| github.com/dropwizard/metrics | Capturing JVM- and application-level metrics, so you know what’s going on | 24,242 |
| github.com/orientechnologies/orientdb | An Open Source NoSQL DBMS with the features of both Document and Graph DBMSs | 168,924 |
| github.com/square/retrofit | Type-safe HTTP client for Android and Java by Square, Inc. | 17,073 |
| github.com/spring-projects/spring-boot | Spring Boot makes it easy to create Spring-powered, production-grade applications and services with absolute minimum fuss | 39,190 |
| Supported by | |||||
|---|---|---|---|---|---|
| Ref. Type | # | RDiff | RMinr | RCraw | RFind |
| Rename Type | 35 | yes | yes | yes | no |
| Move Type | 31 | yes | yes | no | no |
| Extract Superclass | 16 | yes | yes | no | yes |
| Rename Method | 70 | yes | yes | yes | yes |
| Pull Up Method | 15 | yes | yes | yes | yes |
| Push Down Method | 68 | yes | yes | yes | yes |
| Move Method | 31 | yes | yes | yes | yes |
| Extract Method | 29 | yes | yes | no | yes |
| Inline Method | 52 | yes | yes | no | yes |
| Pull Up Field | 33 | yes | yes | no | yes |
| Push Down Field | 42 | yes | yes | no | yes |
| Move Field | 26 | yes | yes | no | yes |
| Total | 448 | ||||
| RDiff | RMinr | RCraw | RFind | |||||
|---|---|---|---|---|---|---|---|---|
| Ref. Type | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall |
| Rename Type | 1.000 | 1.000 | 1.000 | 1.000 | 0.750 | 0.429 | ||
| Move Type | 1.000 | 0.968 | 1.000 | 0.968 | ||||
| Extract Superclass | 1.000 | 0.875 | 1.000 | 0.875 | 0.484 | 0.938 | ||
| Rename Method | 1.000 | 0.943 | 1.000 | 0.886 | 0.971 | 0.486 | 0.868 | 0.843 |
| Pull Up Method | 1.000 | 0.600 | 1.000 | 0.733 | 0.500 | 0.067 | 1.000 | 0.571 |
| Push Down Method | 1.000 | 0.971 | 1.000 | 0.176 | 1.000 | 0.265 | 1.000 | 0.491 |
| Move Method | 1.000 | 1.000 | 1.000 | 0.742 | 0.090 | 0.323 | 0.054 | 0.759 |
| Extract Method | 1.000 | 0.897 | 1.000 | 0.862 | 0.607 | 0.586 | ||
| Inline Method | 1.000 | 0.981 | 1.000 | 0.423 | 0.917 | 0.688 | ||
| Pull Up Field | 1.000 | 0.576 | 1.000 | 0.970 | 1.000 | 0.394 | ||
| Push Down Field | 1.000 | 0.929 | 1.000 | 0.929 | 1.000 | 0.333 | ||
| Move Field | 1.000 | 0.269 | 0.583 | 0.808 | 0.097 | 0.923 | ||
| Approach | TP | FP | FN | Precision | Recall |
|---|---|---|---|---|---|
| RDiff | 393 | 0 | 55 | 1.000 | 0.877 |
| RMinr | 326 | 15 | 122 | 0.956 | 0.728 |
| RCraw | 78 | 108 | 141 | 0.419 | 0.356 |
| RFind | 231 | 645 | 129 | 0.264 | 0.642 |
| RCraw* | 78 | 56 | 141 | 0.582 | 0.356 |
| RFind* | 231 | 241 | 129 | 0.489 | 0.642 |
| RDiff execution time | RMinr execution time | ||||||||
| Repository | Commits | Min. (ms) | Max. (ms) | Avg. (ms) | Total. (s) | Min. (ms) | Max. (ms) | Avg. (ms) | Total. (s) |
| androidannotations/androidannotations | 29 | 1 | 4,956 | 451 | 13 | 1 | 1,753 | 211 | 6 |
| bumptech/glide | 41 | 1 | 3,349 | 594 | 24 | 2 | 8,992 | 466 | 19 |
| elastic/elasticsearch | 946 | 1 | 42,344 | 1,897 | 1,795 | 1 | 103,943 | 1,105 | 1,046 |
| libgdx/libgdx | 69 | 0 | 5,112 | 805 | 56 | 1 | 6,774 | 578 | 40 |
| netty/netty | 225 | 0 | 3,384 | 640 | 144 | 0 | 59,736 | 665 | 150 |
| PhilJay/MPAndroidChart | 14 | 1 | 816 | 245 | 3 | 1 | 310 | 79 | 1 |
| ReactiveX/RxJava | 120 | 1 | 810,744 | 10,475 | 1,257 | 1 | 17,369 | 538 | 65 |
| spring-projects/spring-framework | 478 | 1 | 15,019 | 1,205 | 576 | 1 | 6,133 | 920 | 440 |
| square/okhttp | 45 | 1 | 1,526 | 380 | 17 | 1 | 616 | 178 | 8 |
| zxing/zxing | 23 | 1 | 773 | 342 | 8 | 1 | 502 | 230 | 5 |
| Total | 1990 | 0 | 810,744 | 1,956 | 3,893 | 0 | 103,943 | 894 | 1,779 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
RefDiff: Detecting Refactorings in Version Histories
Danilo Silva1, Marco Tulio Valente2
Department of Computer Science
Universidade Federal de Minas Gerais
Belo Horizonte, Brazil
Email: [email protected], [email protected]
Abstract
Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to understand software evolution, adapt software components, merge code changes, and other applications. In this paper, we present RefDiff, an automated approach that identifies refactorings performed between two code revisions in a git repository. RefDiff employs a combination of heuristics based on static analysis and code similarity to detect 13 well-known refactoring types. In an evaluation using an oracle of 448 known refactoring operations, distributed across seven Java projects, our approach achieved precision of 100% and recall of 88%. Moreover, our evaluation suggests that RefDiff has superior precision and recall than existing state-of-the-art approaches.
Index Terms:
refactoring; software evolution; software repositories; git.
I Introduction
Refactoring is a well-known technique to improve the design of a system and enable its evolution [1]. In fact, existing studies [2, 3, 4, 5, 6] present strong evidences that refactoring is frequently applied by development teams, and it is an important aspect of their software maintenance workflow.
Therefore, knowing about the refactoring activity in a code change is a valuable information to help researchers to understand software evolution. For example, past studies have used such information to shed light on important aspects of refactoring practice, such as: how developers refactor [2], the usage of refactoring tools [7, 2], the motivations driving refactoring [4, 5, 6], the risks of refactoring [4, 5, 8, 9, 10], and the impact of refactoring on code quality metrics [4, 5]. Moreover, knowing which refactoring operations were applied in the version history of a system may help in several practical tasks. For example, in a study by Kim et al. [4], many developers mentioned the difficulties they face when reviewing or integrating code changes after large refactoring operations, which moves or renames several code elements. Thus, developers feel discouraged to refactor their code. If a tool is able to identify such refactoring operations, it can possibly resolve merge conflicts automatically. Moreover, diff visualization tools can also benefit from such information, presenting refactored code elements side-by-side with their corresponding version before the change. Another application for such information is adapting client code to a refactored version of an API it uses [11, 12]. If we are able to detect the refactorings that were applied to an API, we can replay them on the client code automatically.
Although there are approaches capable of detecting refactorings automatically, there are still some issues that hinder their application. Specifically, the precision and recall of such approaches still need improvements. In this paper, we try to fill this gap by proposing RefDiff, an automated approach that identifies refactorings performed in the version history of a system. RefDiff employs a combination of heuristics based on static analysis and code similarity to detect 13 well-known refactoring types. When compared to existing approaches, RefDiff leverages existing techniques and also introduces some novel ideas, such as the adaptation of the classical TF-IDF similarity measure from information retrieval to compare refactored code elements, and a new strategy to compare the similarity of fields by taking into account the similarity of the statements that reads from or writes to them.
In the paper, we also describe in details a study to evaluate the precision and recall of RefDiff and three existing refactoring detection approaches: Refactoring Miner [6], Refactoring Crawler [13], and Ref-Finder [14, 15]. In our study, RefDiff achieved precision of 100% and recall of 88%, which were the best results among the evaluated approaches.
In summary, the contributions we deliver in this work are:
- •
RefDiff, which is a new approach to detect refactoring in version histories. We provide a publicly available111RefDiff and all evaluation data are public available in GitHub:
https://github.com/aserg-ufmg/RefDiff implementation of our approach that is capable of finding refactorings in Java code within git repositories in a fully automated way;
- •
a publicly available oracle of 448 known refactoring operations, applied to seven Java systems, that serves as an evaluation benchmark for refactoring detection approaches; and
- •
an evaluation of the precision and recall of RefDiff, comparing it with three state-of-the-art approaches.
The remainder of this paper is structured as follows. Section II describes related work, focusing on the three approaches we compare with RefDiff. Section III presents the proposed approach in details. Section IV describes how we evaluated RefDiff and discusses the achieved results. Section V discusses threats to validity and we conclude the paper in Section VI.
II Related Work
Empirical studies on refactoring rely on means to identify refactoring activity. Thus, many different techniques have been proposed and employed for this task. For example, Murphy-Hill et al. [2] collected refactoring usage data using a framework that monitors user actions in the Eclipse IDE, including calls to refactoring commands. Negara et al. [7] also used the strategy of instrumenting the IDE to infer refactorings from fine-grained code edits. Other studies use metadata from version control systems to identify refactoring changes. For example, Ratzinger et al. [16] search for a predefined set of terms in commit messages to classify them as refactoring changes. In specific scenarios, a branch may be created exclusively to refactor the code, as reported by Kim et al. [5]. Another strategy is employed by Soares et al. [17]. They propose an approach that identify behavior-preserving changes by automatically generating and running test-cases. While their approach is intended to guarantee the correct behavior of a system after refactoring, it may also be employed to classify commits as behavior-preserving. Moreover, many existing approaches are based on static analysis. This is the case of the approach proposed by Demeyer et al. [18], which finds refactored elements by observing changes in code metrics.
Static analysis is also frequently used to find differences in the source code [13, 19, 3, 14, 15]. Approaches based on comparing source code differences have the advantage of beeing able to identify each refactoring operation performed. As RefDiff is one of these approaches, it can be directly compared with others within this category. In the next sections, we will describe three of such approaches.
II-A Refactoring Miner
Refactoring Miner is an approach introduced by Tsantalis et al. [3], that was later extend by Silva et al. [6] to mine refactorings in large scale in git repositories. This tool is capable of identifying 14 high-level refactoring types: Rename Package/Class/Method, Move Class/Method/Field, Pull Up Method/Field, Push Down Method/Field, Extract Method, Inline Method, and Extract Superclass/Interface.
Refactoring Miner runs a lightweight algorithm, similar to the UMLDiff proposed by Xing and Stroulia [20], for differencing object-oriented models, inferring the set of classes, methods, and fields added, deleted or moved between two code revisions. First, the algorithm matches code entities in a top-down order (starting from the classes and going to the methods and fields) looking for exact matches on their names and signatures (in the case of methods). Next, the removed/added elements between the two models are matched based only on the equality of their names in order to find changes in the signatures of fields and methods. Third, the removed/added classes are matched based on the similarity of their members at signature level. Finally, a set of rules enforcing structural constraints is applied to identify specific types of refactorings.
In a first study, using the version histories of JUnit, HTTPCore, and HTTPClient, Tsantalis et al. [3] found 8 false positives for the Extract Method refactoring (96.4% precision) and 4 false positives for the Rename Class refactoring (97.6% precision). No false positives were found for the remaining refactorings. In a second study that mined refactorings in 285 GitHub hosted Java repositories, Silva et al. [6] found 1,030 false positives out of 2,441 refactorings (63% precision). However, the authors also evaluated Refactoring Miner using as a benchmark the dataset reported by Chaparro et al. [21], in which it achieved 93% precision and 98% recall.
II-B Refactoring Crawler
Refactoring Crawler, proposed by Dig et al. [13], is an approach capable of finding seven high-level refactoring types: Rename Package/Class/Method, Pull Up Method, Push Down Method, Move Method, and Change Method Signature. It uses a combination of a syntactic analysis to detect refactoring candidates and a more expensive reference graph analysis to refine the results.
First, Refactoring Crawler analyzes the abstract syntax tree of a program and produces a tree, in which each node represents a source code entity (package, class, method, or field). Then, it employs a technique known as shingles encoding to find similar pairs of entities, which are candidates for refactorings. Shingles are representations for strings with the following property: if a string changes slightly, then its shingles also change slightly. In a second phase, Refactoring Crawler applies specific strategies for detecting each refactoring type, and computes a more costly metric that determines the similarity of references among code entities in the two versions of the system. For example, two methods are similar if the sets of methods that call them are similar, and the sets of methods they call are also similar. The strategies to detect refactorings are repeated in a loop until no new refactorings are found. Therefore, the detection of a refactoring, such as a rename, may change the reference graph of code elements and enable the detection of new refactorings.
The authors evaluated Refactoring Crawler comparing pairs of releases of three open source software components: Eclipse UI, Struts, and JHotDraw. Such components were chosen because they provided detailed release notes describing API changes. The authors relied on such information and on manual inspection to build an oracle of known refactorings in those releases, containing 131 refactorings in total. The reported results are: Eclipse UI (90% precision and 86% recall), Struts (100% precision and 86% recall), and JHotDraw (100% precision and 100% recall).
II-C Ref-Finder
Ref-Finder, proposed by Prete et al. [14, 15], is an approach based on logic programming capable of identifying 63 refactoring types from the Fowler’s catalog[1]. The authors express each refactoring type by defining structural constraints, before and after applying a refactoring to a program, in terms of template logic rules.
First, Ref-Finder traverses the abstract syntax tree of a program and extracts facts about code elements, structural dependencies, and the content of code elements, to represent the program in terms of a database of logic facts. Then, it uses a logic programming engine to infer concrete refactoring instances, by creating a logic query based on the constraints defined for each refactoring type. The definition of refactoring types also consider ordering dependencies among them. This way, lower-level refactorings may be queried to identify higher-level, composite refactorings. The detection of some types of refactoring requires a special logic predicate that indicates that the similarity between two methods is above a threshold. For this purpose, the authors implemented a block-level clone detection technique, which removes any beginning and trailing parenthesis, escape characters, white spaces and return keywords and computes word-level similarity between the two texts using the longest common sub-sequence algorithm.
The authors evaluated Ref-Finder in two case studies. In the first one, they used code examples from the Fowler’s catalog to create instances of the 63 refactoring types. The authors reported 93.7% recall and 97.0% precision for this first study. In the second study, the authors used three open-source projects: Carol, jEdit, and Columba. In this case, Ref-Finder was executed in randomly selected pairs of versions. From the 774 refactoring instances found, the authors manually inspected a sample of 344 instances and found that 254 were correct (73.8% precision). However, in a study by Soares et al. [22] using a set of randomly select versions of JHotDraw and Apache Common Collections containing 81 refactoring instances in total, Ref-Finder achieved only 35% precision and 24% recall.
III Proposed Refactoring Detection Algorithm
RefDiff employs a combination of heuristics based on static analysis and code similarity to detect refactorings between two revisions of a system. Thus, RefDiff takes as input two versions of a system, and outputs a list of refactorings found.
The detection algorithm is divided in two main phases: Source Code Analysis and Relationship Analysis. In the first phase, the source code of the system is parsed and analyzed to build a model that represents each high level source code entity, such as types, methods, and fields. Two models are built to represent the system before () and after the changes (). For efficiency, only code entities that belong to modified source files (added, removed or edited) are analyzed. Each of these two models is a set of types, method, and fields contained in the source code. Specifically, , such that , , and are the sets of types, methods, and fields in the source code before the changes, and , such that , , and are the sets of types, methods, and fields after the changes.
The second phase of the algorithm, Relationship Analysis, consists in finding relationships between source code entities before and after the code changes. Specifically, the algorithm builds a bipartite graph with two sets of vertices: code entities before () and code entities after (). The edges of this graph are represented by the set of relationships between code entities. For example, a certain method may correspond to a method that was renamed by a developer. This would correspond to a Rename Method relationship between and and, consequently, to a Rename Method refactoring.
Table I presents all relationships that RefDiff can identify between types, methods, or fields. We search for relationships between source code entities considering each relationship type in the order they are presented in the table. The following sections detail how such relationships are identified.
III-A Matching Relationships
Some kinds of relationships map code entities before the change to code entities after the change. For example, let be a type in the version before the change. If our algorithm finds another type with the same qualified name, it adds a relationship Same Type between and in . This is a matching relationship, because corresponds to after the change. Other examples of matching relationship are Move Type, Rename Type, and Pull Up Method. In contrast, suppose that our algorithm finds that is a method that was extracted from another method . In this case, there is an Extract Method relationship between and , but this is not a matching relationship, because does not correspond to after the change. From this point on, we use the notation to represent a matching relationship between and .
We discriminate matching relationships from non-matching relationships because their detection algorithm is similar. For each matching relationship type, we find all pairs of entities that fall under the conditions specified in Table I. Each relationship type has its specific conditions. For example, as presented in Table I, the conditions for identifying a Rename Method between and are:
- •
the names of and should be different;
- •
there should exist a matching relationship between the container classes of and ; and
- •
the similarity index between and , denoted by , should be greater than a threshold .
Whenever these conditions hold, we add the triple in a list of potential Rename Method relationships.
The last step to find the actual relationships consists in selecting non-conflicting relationships from the list of potential relationships and add them to the graph. For example, there may be in the list two potential Rename Method relationships: and . However, a code entity can not be involved in more than one matching relationship. Thus, only one of them must be chosen, because could not be renamed to and to . The criterion we use is to choose the triple with the higher similarity index. This means that, in the aforementioned example, we would choose the triple and discard . In Section III-C we describe in details how the similarity index is computed.
III-B Non-matching Relationships
In the previous section, we discussed that an entity could not be involved in multiple matching relationships, but this property does not hold for non-matching relationships. For example, suppose that a developer extracted some code from a method into a new method , i.e., an Extract Method refactoring was applied. It is also possible that the developer extracted another part of into a new method .
Given that non-matching relationships do not conflict with each other, the algorithm to identify them is simpler. We just need to find all pairs of entities that fall under the conditions specified in Table I. For example, the conditions for identifying an Extract Method relationship between and are:
- •
there should not exist a method such that (i.e., was added);
- •
there should exist a method such that (i.e., was not removed);
- •
should call ; and
- •
the similarity index between and , denoted by , should be greater than a threshold .
Besides Extract Method, our approach supports the detection of Inline Method and Extract Supertype relationships.
III-C Computing Similarity
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Fowler, Refactoring: Improving the Design of Existing Code . Addison-Wesley, 1999.
- 2[2] E. R. Murphy-Hill, C. Parnin, and A. P. Black, “How we refactor, and how we know it,” IEEE Transactions on Software Engineering , vol. 38, no. 1, pp. 5–18, 2012.
- 3[3] N. Tsantalis, V. Guana, E. Stroulia, and A. Hindle, “A multidimensional empirical study on refactoring activity,” in Conference of the Centre for Advanced Studies on Collaborative Research (CASCON) , 2013, pp. 132–146.
- 4[4] M. Kim, T. Zimmermann, and N. Nagappan, “A field study of refactoring challenges and benefits,” in 20th Symposium on the Foundations of Software Engineering (FSE) , 2012, pp. 50:1–50:11.
- 5[5] ——, “An empirical study of refactoring challenges and benefits at Microsoft,” IEEE Transactions on Software Engineering , vol. 40, no. 7, July 2014.
- 6[6] D. Silva, N. Tsantalis, and M. T. Valente, “Why we refactor? confessions of Git Hub contributors,” in 24th Symposium on the Foundations of Software Engineering (FSE) , 2016, pp. 858–870.
- 7[7] S. Negara, N. Chen, M. Vakilian, R. E. Johnson, and D. Dig, “A comparative study of manual and automated refactorings,” in 27th European Conference on Object-Oriented Programming (ECOOP) , 2013, pp. 552–576.
- 8[8] M. Kim, D. Cai, and S. Kim, “An empirical investigation into the role of API-level refactorings during software evolution,” in 33rd International Conference on Software Engineering (ICSE) , 2011, pp. 151–160.
