Predicting Health Indicators for Open Source Projects (using Hyperparameter Optimization)
Tianpei Xia, Wei Fu, Rui Shu, Rishabh Agrawal, Tim Menzies

TL;DR
This study demonstrates that hyperparameter optimization significantly improves the accuracy of predicting health indicators in open-source projects using large-scale GitHub data.
Contribution
It is the first large-scale study to apply hyperparameter optimization to predict multiple project health indicators accurately.
Findings
Hyperparameter tuning reduces prediction errors substantially.
Traditional algorithms like KNN and SVR have high error rates without optimization.
Large-scale data enables effective prediction of project health status.
Abstract
Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions about the recent status of those projects (as of April 2020). We find that traditional estimation algorithms make many mistakes. Algorithms like -nearest neighbors (KNN), support vector regression (SVR), random forest (RFT), linear regression (LNR), and regression trees (CART) have high error rates. But that error rate can be greatly reduced using hyperparameter optimization. To the best of our knowledge, this is the largest study yet conducted,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Engineering Techniques and Practices
