Apples, Oranges & Fruits -- Understanding Similarity of Software Projects Through The Lens of Dissimilar Artifacts
A Eashaan Rao, Sridhar Chimalakonda

TL;DR
This paper explores the novel idea that software repositories can be similar even when their artifacts differ, by analyzing various artifact types like documentation, commits, and source code.
Contribution
It demonstrates that dissimilar artifacts such as commits and documentation can reveal similarities between different software projects, challenging traditional artifact comparison methods.
Findings
Dissimilar artifacts can indicate repository similarity.
Similarities exist between dissimilar artifacts like commits and documentation.
The approach broadens understanding of software repository similarity.
Abstract
The growing availability of open source projects has facilitated developers to reuse existing software artifacts and leverage them to develop new software. However, it is hard to understand the notion of similarity as it varies from developer to developer. Some developers might search for repositories with similar source code, while some might be in search of repositories with similar requirements or issues. Existing approaches tend to find similar projects by comparing similar artifacts such as source-code to source-code, API usage to API usage, documentation to documentation, and so on. Even though there is a dissimilarity between two similar artifacts, there could be a similarity between two dissimilar artifacts. Hence, in this paper, we aim to answer the question - Can we find similarity of software repositories through dissimilar artifacts?. To this end, we conduct an experiment to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research
