Predicting Abandonment of Open Source Software Projects with An Integrated Feature Framework
Yiming Xu, Runzhi He, Hengzhi Ye, Minghui Zhou, Huaimin Wang

TL;DR
This paper presents a scalable, interpretable framework for predicting OSS project abandonment using a large dataset, multi-perspective features, and survival analysis, improving prediction accuracy and practical applicability.
Contribution
It introduces an integrated feature framework and a precise labeling pipeline, enabling high-performance, interpretable abandonment prediction for large OSS datasets.
Findings
Survival analysis achieves a C-index of 0.846, outperforming surface feature models.
The framework effectively identifies at-risk OSS projects with high accuracy.
Practical deployment demonstrates utility in openEuler package risk assessment.
Abstract
Open Source Software (OSS) is a cornerstone of contemporary software development, yet the increasing prevalence of OSS project abandonment threatens global software supply chains. Although previous research has explored abandonment prediction methods, these methods often demonstrate unsatisfactory predictive performance, further plagued by imprecise abandonment discrimination, limited interpretability, and a lack of large, generalizable datasets. In this work, we address these challenges by reliably detecting OSS project abandonment through a dual approach: explicit archival status and rigorous semantic analysis of project documentation or description. Leveraging a precise and scalable labeling pipeline, we curate a comprehensive longitudinal dataset of 115,466 GitHub repositories, encompassing 57,733 confirmed abandonment repositories, enriched with detailed, timeline-based behavioral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
