Survey Results on Threats To External Validity, Generalizability Concerns, Data Sharing and University-Industry Collaboration in Mining Software Repository (MSR) Research
Ashish Sureka, Ambika Tripathi, Savita Dabral

TL;DR
This survey investigates MSR researchers' views on external validity, data sharing, and university-industry collaboration, revealing limited data sharing practices and significant barriers to industrial data access.
Contribution
It provides empirical insights into current practices and perceptions regarding data sharing and collaboration in MSR research through a survey of recent conference authors.
Findings
About one-third of researchers always share datasets publicly.
Over 50% use only open-source datasets.
Difficulty in sharing industrial data hampers collaboration.
Abstract
Mining Software Repositories (MSR) is an applied and practise-oriented field aimed at solving real problems encountered by practitioners and bringing value to Industry. Replication of results and findings, generalizability and external validity, University-Industry collaboration, data sharing and creation dataset repositories are important issues in MSR research. Research consisting of bibliometric analysis of MSR paper shows lack of University-Industry collaboration, deficiency of studies on closed or propriety source dataset and lack of data as well as tool sharing by researchers. We conduct a survey of authors of past three years of MSR conference (2012, 2013 and 2014) to collect data on their views and suggestions to address the stated concerns. We asked 20 questions from more than 100 authors and received a response from 39 authors. Our results shows that about one-third of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Data Mining Algorithms and Applications
