Identifying Unmaintained Projects in GitHub
Jailton Coelho, Marco Tulio Valente, Luciana L. Silva, Emad Shihab

TL;DR
This paper presents a machine learning approach to identify unmaintained GitHub projects, helping users assess risks and encouraging maintenance, validated with real developer feedback and achieving high precision and recall.
Contribution
It introduces a novel machine learning model that accurately detects unmaintained projects using activity features, aiding sustainability in open source software.
Findings
Precision of 80% in identifying unmaintained projects
Recall of 96% indicating effective detection
Model helps assess project maintenance risks
Abstract
Background: Open source software has an increasing importance in modern software development. However, there is also a growing concern on the sustainability of such projects, which are usually managed by a small number of developers, frequently working as volunteers. Aims: In this paper, we propose an approach to identify GitHub projects that are not actively maintained. Our goal is to alert users about the risks of using these projects and possibly motivate other developers to assume the maintenance of the projects. Method: We train machine learning models to identify unmaintained or sparsely maintained projects, based on a set of features about project activity (commits, forks, issues, etc). We empirically validate the model with the best performance with the principal developers of 129 GitHub projects. Results: The proposed machine learning approach has a precision of 80%, based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
