MSR Mining Challenge: The SmartSHARK Repository Mining Data

Alexander Trautsch; Fabian Trautsch; Steffen Herbold

arXiv:2102.11540·cs.SE·August 5, 2021·6 cites

MSR Mining Challenge: The SmartSHARK Repository Mining Data

Alexander Trautsch, Fabian Trautsch, Steffen Herbold

PDF

Open Access

TL;DR

The paper introduces the SmartSHARK repository mining data, a comprehensive dataset capturing detailed software project evolution, including changes, issues, CI, pull requests, and annotations, facilitating advanced research in software engineering.

Contribution

It presents a unique, richly annotated dataset combining diverse data sources and labels, enabling complex longitudinal and multi-source analyses in software evolution research.

Findings

01

Rich, detailed data enables complex analyses.

02

Annotations improve data usability.

03

Supports longitudinal and multi-source research.

Abstract

The SmartSHARK repository mining data is a collection of rich and detailed information about the evolution of software projects. The data is unique in its diversity and contains detailed information about each change, issue tracking data, continuous integration data, as well as pull request and code review data. Moreover, the data does not contain only raw data scraped from repositories, but also annotations in form of labels determined through a combination of manual analysis and heuristics, as well as links between the different parts of the data set. The SmartSHARK data set provides a rich source of data that enables us to explore research questions that require data from different sources and/or longitudinal data over time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Open Source Software Innovations