Fork Entropy: Assessing the Diversity of Open Source Software Projects' Forks
Liang Wang, Zhiwen Zheng, Xiangchen Wu, Baihui Sang, Jierui Zhang,, Xianping Tao

TL;DR
This paper introduces a novel fork entropy metric based on Rao's quadratic entropy to quantify the diversity of forks in open source projects, providing deeper insights beyond mere fork counts.
Contribution
The paper proposes a new fork entropy measure for OSS project forks and demonstrates its effectiveness through empirical analysis of GitHub projects.
Findings
Fork entropy correlates with external productivity and bug reports.
Fork entropy interacts with the number of forks to influence project outcomes.
The metric offers a simple, effective way to assess fork diversity.
Abstract
On open source software (OSS) platforms such as GitHub, forking and accepting pull-requests is an important approach for OSS projects to receive contributions, especially from external contributors who cannot directly commit into the source repositories. Having a large number of forks is often considered as an indicator of a project being popular. While extensive studies have been conducted to understand the reasons of forking, communications between forks, features and impacts of forks, there are few quantitative measures that can provide a simple yet informative way to gain insights about an OSS project's forks besides their count. Inspired by studies on biodiversity and OSS team diversity, in this paper, we propose an approach to measure the diversity of an OSS project's forks (i.e., its fork population). We devise a novel fork entropy metric based on Rao's quadratic entropy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Open Source Software Innovations · Scientific Computing and Data Management
