Who Made This Copy? An Empirical Analysis of Code Clone Authorship
Reishi Yokomori, Katsuro Inoue

TL;DR
This paper empirically investigates code clone authorship in Java projects, revealing clone prevalence, author contribution patterns, and multi-author clone sets to inform better clone management strategies.
Contribution
It provides the first empirical analysis of code clone authorship across multiple projects, highlighting clone characteristics and authorship patterns.
Findings
Average of 18.5% clone lines across projects
Authors contributing to non-clone lines also contribute to clone lines
One-third of clone sets are mainly contributed to by multiple authors
Abstract
Code clones are code snippets that are identical or similar to other snippets within the same or different files. They are often created through copy-and-paste practices during development and maintenance activities. Since code clones may require consistent updates and coherent management, they present a challenging issue in software maintenance. Therefore, many studies have been conducted to find various types of clones with accuracy, scalability, or performance. However, the exploration of the nature of code clones has been limited. Even the fundamental question of whether code snippets in the same clone set were written by the same author or different authors has not been thoroughly investigated. In this paper, we investigate the characteristics of code clones with a focus on authorship. We analyzed the authorship of code clones at the line-level granularity for Java files in 153…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
