Standing on Shoulders or Feet? An Extended Study on the Usage of the MSR Data Papers
Zoe Kotti, Konstantinos Kravvaritis, Konstantina Dritsa, Diomidis, Spinellis

TL;DR
This study analyzes the usage, impact, and citation patterns of MSR data papers, revealing that most are used in subsequent research, with variations based on data type and author involvement, informing future data sharing practices.
Contribution
It provides an empirical analysis of MSR data papers' usage, citation patterns, and author motivations, offering insights into data paper impact and areas for improvement.
Findings
65% of data papers are used in other studies
Version Control System data papers are most frequently cited
Enhanced developer data papers are least common and cited less
Abstract
Context: The establishment of the Mining Software Repositories (MSR) data showcase conference track has encouraged researchers to provide data sets as a basis for further empirical studies. Objective: Examine the usage of data papers published in the MSR proceedings in terms of use frequency, users, and use purpose. Method: Data track papers were collected from the MSR data showcase track and through the manual inspection of older MSR proceedings. The use of data papers was established through manual citation searching followed by reading the citing studies and dividing them into strong and weak citations. Contrary to weak, strong citations truly use the data set of a data paper. Data papers were then manually clustered based on their content, whereas their strong citations were classified by hand according to the knowledge areas of the Guide to the Software Engineering Body of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
