Exploring User Privacy Awareness on GitHub: An Empirical Study
Costanza Alfieri, Juri Di Rocco, Paola Inverardi, Phuong T. Nguyen

TL;DR
This empirical study investigates how GitHub users utilize privacy settings, revealing disclosures of sensitive information and exploring the potential of language models to develop personalized privacy protection tools.
Contribution
The paper provides the first comprehensive analysis of privacy setting usage on GitHub and proposes a methodology using language models for personalized privacy assistance.
Findings
Active user engagement with privacy settings
Disclosures of sensitive information in pull request comments
Potential for personalized privacy tools using language models
Abstract
GitHub provides developers with a practical way to distribute source code and collaboratively work on common projects. To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication. However, despite the endless effort, the platform still faces various issues related to the privacy of its users. This paper presents an empirical study delving into the GitHub ecosystem. Our focus is on investigating the utilization of privacy settings on the platform and identifying various types of sensitive information disclosed by users. Leveraging a dataset comprising 6,132 developers, we report and analyze their activities by means of comments on pull requests. Our findings indicate an active engagement by users with the available privacy settings on GitHub. Notably, we observe the disclosure of different forms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection
