Should I Get Involved? On the Privacy Perils of Mining Software Repositories for Research Participants
Melina Vidoni, Nicol\'as E. D\'iaz Ferreyra

TL;DR
This paper discusses the privacy risks involved in mining software repositories for research, highlighting the need to balance data utility with participant privacy and ethical considerations.
Contribution
It introduces a discussion on privacy challenges and ethical issues related to participant data in MSR studies, emphasizing the importance of privacy-preserving practices.
Findings
Privacy risks linked to participant identities in MSRs
Potential for 'guilty by association' effects
Need for privacy-aware data sharing policies
Abstract
Mining Software Repositories (MSRs) is an evidence-based methodology that cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development so as to analyse the different factors contributing to them. Hence, counting on fine-grained information about the repositories and sources being mined (e.g., server names, and contributors' identities) is essential for the reproducibility and transparency of MSR studies. However, this can also introduce threats to participants' privacy as their identities may be linked to flawed/sub-optimal programming practices (e.g., code smells, improper documentation), or vice-versa. Moreover, this can be extensible to close collaborators and community members resulting "guilty by association". This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Mobile Crowdsensing and Crowdsourcing · Open Source Software Innovations
