Predicting the Popularity of GitHub Repositories
Hudson Borges, Andre Hora, Marco Tulio Valente

TL;DR
This paper presents a method using multiple linear regression to predict GitHub repository popularity based on star counts, with high accuracy in both star number and ranking predictions, aiding project assessment.
Contribution
It introduces a large-scale analysis of star prediction models and recommends specific models for different repository growth trends, improving prediction accuracy.
Findings
Predictions become accurate after six months of data.
Models for slow-growing repositories are effective.
Strong correlation (rho > 0.95) in rank prediction.
Abstract
GitHub is the largest source code repository in the world. It provides a git-based source code management platform and also many features inspired by social networks. For example, GitHub users can show appreciation to projects by adding stars to them. Therefore, the number of stars of a repository is a direct measure of its popularity. In this paper, we use multiple linear regressions to predict the number of stars of GitHub repositories. These predictions are useful both to repository owners and clients, who usually want to know how their projects are performing in a competitive open source development market. In a large-scale analysis, we show that the proposed models start to provide accurate predictions after being trained with the number of stars received in the last six months. Furthermore, specific models---generated using data from repositories that share the same growth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
