Open Source Software Development Challenges: A Systematic Literature Review on GitHub
Abdulkadir \c{S}eker, Banu Diri, Halil Arslan, Mehmet Fatih, Amasyal{\i}

TL;DR
This paper systematically reviews 172 studies using the GHTorrent dataset to analyze open source software development challenges on GitHub, highlighting research focuses, dataset advantages, limitations, and open issues.
Contribution
It provides the first comprehensive review of GHTorrent-based research on open source development challenges, categorizing studies and discussing dataset pros, cons, and open problems.
Findings
Identified key research themes in GHTorrent-based OSS studies
Highlighted dataset limitations and challenges in research
Mapped open issues and future directions in OSS development
Abstract
Git is used as the distributed version control system for many open-source software projects. One Git-based service, GitHub, is the most common code hosting and repository service for open-source software projects. For researchers that study software engineering, the content that is hosted on these platforms provides much valuable data. There are some alternatives to get GitHub data such as GitHub Archive, GitHub API or GHTorrent. Among these options, GHTorrent is the most widely known and used GitHub dataset in the literature. Although there are some review studies about software engineering challenges across the GitHub platform, no review of GHTorrent dataset-specific research is available. In this study, the 172 studies that use GHTorrent as a data source were categorized within the scope of open source software development challenges and a systematic literature review was carried…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Open Source Software Innovations
