Forecasting the Maintained Score from the OpenSSF Scorecard: A Study of GitHub Repositories Linked to PyPI Packages
Alexandros Tsakpinis, Efe Berk Erg\"ule\c{c}, Emil Schwenger, Alexander Pretschner

TL;DR
This study explores forecasting future maintenance activity of open-source repositories linked to PyPI packages using time series models, demonstrating high accuracy with simple methods.
Contribution
It introduces a method to predict future OpenSSF Maintained scores, combining historical data with machine learning and deep learning models, highlighting the effectiveness of simple models.
Findings
Forecasting accuracy exceeds 0.95 for bucketed scores.
Simple machine learning models perform comparably to deep learning.
Aggregated representations improve forecast accuracy.
Abstract
Background: The OpenSSF Scorecard is widely used to assess the security posture of open-source software repositories, with the Maintained metric serving as a key indicator of recent maintenance activities, helping users identify actively maintained projects and potentially abandoned dependencies. However, the metric is inherently retrospective, providing only a short-term snapshot based on the past 90 days of repository activity and offering no insight into the future. This limitation complicates risk assessment for developers and organizations that rely on open-source dependencies. Aims: In this paper, we investigate the feasibility of forecasting future maintenance activities as captured by the OpenSSF Maintained score. Method: Focusing on 3,220 GitHub repositories linked to one of the top 1% most central PyPI libraries, as ranked by PageRank, we reconstruct historical Maintained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
