Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries
Alexandros Tsakpinis, Alexander Pretschner

TL;DR
This study examines how accessible GitHub repositories are for PyPI and NPM libraries, revealing varying levels of repository URL availability and highlighting the importance of maintaining accurate links for monitoring library health.
Contribution
It provides a comprehensive analysis of repository URL accessibility in PyPI and NPM ecosystems using dependency and page rank analysis, identifying key issues and potential improvements.
Findings
Up to 73.8% of PyPI and 69.4% of NPM libraries have repository URLs.
Dependency chains show up to 80.1% for PyPI and 81.1% for NPM libraries have URLs.
No URLs assigned is a common reason for invalid links, up to 39.6% in NPM.
Abstract
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. But, they can also present a substantial risk if a vulnerability or attack arises and the community fails to promptly address the issue and release a fix due to inactivity. To be able to monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. Based on these repositories, integrated libraries of an application can be monitored to observe whether they are adequately maintained. In this descriptive study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries. For all available libraries, we extract assigned repository URLs, direct dependencies and use the page rank algorithm to comprehensively analyze the ecosystems from a library and dependency chain perspective. For invalid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
